


Vol. 89, Parts 1 and 2 May 1952 


BIOMETRIKA 


FOUNDED BY 
W. F. R. WELDON, FRANCIS GALTON ann KARL PEARSON 


MANAGING EDITOR 


E. S. PEARSON 


ASSOCIATE EDITORS 


M. G. KENDALL JOHN WISHART 
in consuization with 
HARALD CRAMER R. C. GEARY 


J. B.S. HALDANE 


Reprinted by offset-litho 1963 


ISSUED BY 
THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 


PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


[Issued 4 June 1952) 








na 
eels 

| Eres 
BIOMETRI 








] 
| 























Ae PEI OAL MEA GPE AEE AAP TP: 


meting 














at eal all aaaiie ” 


eS ere 















BIOMETRIKA PUBLICATIONS: BOOKS OF TABLES 





Issued by the Cambridge University Press, Bentley House, London, N.W.1 
and obtainable from any bookseller 


Tables of Complete and Incomplete B-Function 
EDITED BY KARL PEARSON 
59 pages of Introduction and 494 pages of Tables 


Price: §5s. net 


Tables of the Incomplete f-Function 
EDITED BY KARL PEARSON 
31 pages of Introduction and 164 pages of Tables 


Price: 42s. net 


Tables of the Complete and Incomplete Elliptic Integrals 


(from LEGENDRE’S Traité des Fonctions Elliptiques. With autographed portrait of LEGENDRE) 
39 pages of Introduction by KARL PEARSON and 94 pages of Tables 


Price: 12s. 6d. net 


Tables of the Ordinates and Probability Integral of the 
Distribution of the Correlation Coefficient in Small Samples 
By F.N. DAVID 
38 pages of Introduction, 55 pages of Tables, 10 Diagrams and 4 Charts 


Price: 10s. net 


NEW PUBLICATION 
Biometrika Tables for Statisticians 


The two volumes of Tables for Statisticians and Biometricians are now out of print and will not be re- 
issued. 

At the request of the Biometrika Trustees a complete recasting of these Tables has been undertaken 
by Professor E. S. PEARSON and Dr H. O. HARTLEY. Many of the old tables will be set aside or modified, 
tables which have been published during the last fifteen years in Biometrika will be reproduced and some 
new tables will be added. 

Volume I of the new series, which will include the statistical and auxiliary mathematical tables in more 
common use is now ready for Press. It will contain 54 tables covering about 135 pp. and an Introduction. 

















| stometaixa, 39, 1 and 2 

















NEW STATISTICAL TABLES: SEPARATES RE-ISSUED 
FROM BIOMETRIKA 


To be obtained from 


THE BIOMETRIKA OFFICE; UNIVERSITY COLLEGE, LONDON, W.C.1 


|. From Biometrika, Vols. 22 and 27 
Tests of Normality. By E. S. PEARSON and R. C. GEARY 
Price One Shilling, post free 


ll. From Biometrika, Vol. 32, Part 2, pp. 168-181 and 188-189 
(1) Table of percentage points of the incomplete beta-function 
(2) Table of percentage points of the x? distribution 
Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


Ill. From Biometrika, Vol. 32, Parts 3 and 4, pp. 300-310 
(1) Table of the probability integral of the range in samples from a normal population 
(2) Table of the percentage points of the range 
(3) Table of the percentage points of the t-distribution 
Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


IV. From Biometrika, Vol. 33, Part 1, pp. 73-88 
Table of percentage points of the inverted beta (F) distribution 


With introductory matter. Price Two Shillings and Sixpence, post free 


V. From Biometrika, Vol. 33, Part 3, pp. 252-265 


(1) Table of the probability integral of the mean deviation in samples from a normal 
population 
(2) Table of the percentage points of the mean deviation 


Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


Vi. From Biometrika, Vol. 33, Part 4, pp. 296-304 
Table for testing the homogeneity of a set of estimated variances 
With introductory matter. Price Two Shillings, post free 


Vil. From Biometrika, Vol. 35, Parts 1 and 2, pp. 145-156 
Table of significance levels for the Fisher-Yates test of significance in 2x2 contingency 
tables. By D. J. FINNEY 


With introductory matter. Price Two Shillings and Sixpence, post free 


Vill. From Biometrika, Vol. 35, Parts 1 and 2, pp. 191-201 


Table for the calculation of working probits and weights in probit analysis. By D. J. FINNEY 
and W. L. STEVENS 


With introductory matter. Price Two Shillings and Sixpence, post free 


IX. From Biometrika, Vol. 36, Parts 3 and 4, pp. 267-289 
Tables of autoregressive series. By M. G. KENDALL 
With introductory matter. Price Two Shillings and Sixpence, post free 


X. From Biometrika, Vol. 36, Parts 3 and 4, pp. 431-449 
Tables of symmetric functions, Part Il. By F. N. DAVID and M. G. KENDALL 
With introductory matter. Price Two Shillings and Sixpence, post free 


(ii) 





cy 


re 
=< 








NEW STASTISTICAL TABLES: continued 
XI. From Biometrika, Vol. 35, Parts 1 and 2, pp. 118-144 


The distribution of the extreme deviate from the sample mean and its “studentized” form. 
By K. R. NAIR 


Price Two Shillings and Sixpence, post free 
XIl. From Biometrika, Vol. 37, pp. 168-172 and pp. 313-325 


(1) Table of the probability integral of the t-distribution 


(2) Table of the x’ integral, and of the cumulative Poisson distribution. By H. O. HARTLEY 
and E. S. PEARSON 


Stitched together with introductory matter. Price Five Shillings, post free 
XIll. From Biometrika, Vol. 38, Parts 1 and 2, pp. 112-130 


Charts of the power function for analysis of variance tests, derived from the non-central 
F-distribution. By E. S. PEARSON and H. O. HARTLEY 


With introductory matter. Price Two Shillings and Sixpence, post free 
XIV. From Biometrika, Vol. 38, Parts 3 and 4, pp. 435-462 
Tables of symmetric functions. Parts i and Ill. By F. N. DAVID and M. G. KENDALL 
With introductory matter. Price Four Shillings, post free 
XV. From Biometrika, Vol. 38, Parts 3 and 4, pp. 423-426 


A chart for the incomplete beta-function and the cumulative binomial distribution. By H. O. 
HARTLEY and E. R. FITCH 


With introductory matter and ruler scale. Price Two Shillings and Sixpence, post free 


NEW PUBLICATIONS 
LOGARITHMETICA BRITANNICA 


A standard Table of Logarithms to Twenty Decimal Places. By A. J. THOMPSON, Ph.D. 
(commenced in 1922 to commemorate the tercentenary of the publication of HENRY BRIGGs’s 
Arithmetica Logarithmica). 


Part II, containing the logarithms of numbers 20,000-30,000, which constitutes the ninth and final 
section of the Table, was finished some years ago but publication was postponed until the author’s General 
Introduction to the complete work had been written, revised and set up in type. 

This Introduction deals with the methods of interpolation required and describes in some detail the 
mode of construction of the table and the steps taken to safeguard the accuracy of printing. Much of 
the material is of general interest to table makers and table users, being the fruit of Dr Thompson’s long 
experience gained in the production of this great standard table. At the end of the Introduction are in- 
cluded as Appendices: 


(i) a Life of Henry Briggs by Thomas Smith, 1707, translated from the Latin by J. T. Foxell; 
(ii) a table of the errors in Briggs’s Arithmetica Logarithmica of 1624; 
(iii) certain Auxiliary Tables. 


PRICE: Part II (100 pp.) and Introductory matter (98 pp.) 45 shillings. 


After publication, which it is expected will occur this autumn, steps will be taken to make available in 
two bound volumes, the nine separate sections of the table (logarithms of numbers 10,000-100,000) 
together with the General Introduction. 


ISSUED BY THE CAMBRIDGE UNIVERSITY PRESS 


on behalf of the 
DEPARTMENT OF STATISTICS, UNIVERSITY COLLEGE, LONDON 





(iii) 














BIOMETRIKA PUBLICATIONS 


Issued by the 
Cambridge University Press, Bentley House, London, N.W. 1 
and obtainable from any bookseller 


The Chances of Death and Other Studies in 
Evolution. Vols. I and II 


By KARL PEARSON, F.R:S. 


Price 30s. net 


The Life, Letters and Labours of Francis Galton 
Vols. I, I, ILA, and IIIB 


By KARL PEARSON, F.R:S. 
Price £3. 3s. net 


Karl Pearson: An Appreciation of Some Aspects of 
his Life and Work 


By E. S. PEARSON 
Price 10s. 6d. net 


A Bibliography of the Statistical and Other 
Writings of Karl Pearson 
COMPILED BY G. M. MORANT, WITH THE ASSISTANCE OF B. L. WELCH 


Price 6s. net 


**Student’s”’ Collected Papers 


EDITED BY E. S. PEARSON ano JOHN WISHART 
WITH A FOREWORD BY LAUNCE MCMULLEN 
Price 15s. net 


Karl Pearson’s Early Statistical Papers 


Reprinted by photo-lithography for the Biometrika Trust, with the permission of the original publishers. 
The Volume contains eleven papers, including the more important of the memoirs entitled ** Mathe- 
matical Contributions to the Theory of Evolution”, first published in the Philosophical Transactions 
of the Royal Society. The original paper deriving the y*-distribution, published in 1900 in the Philoso- 


Phical Magazine, is also included. 
Price 2\s. net 























(iv) 








JOURNAL 


OF THE 


ROYAL STATISTICAL SOCIETY 


THE JOURNAL OF THE ROYAL STATISTICAL SOCIETY is published in two 
series: SERIES A (GENERAL), four times a year, 15s. each part, annual sub- 
scription £3. ls. post free; SERIES B (METHODOLOGICAL), two issues a year, 
22s. 6d. each part, annual subscription 45s. 6d. post free. 


CONTENTS OF FORTHCOMING ISSUES 


SERIES A (GENERAL), VOL. CXV, PART I, 1951 


The Econometrics of Family Budgets. By H. S. HOUTHAKKER. (With Discussion.) 
The Interdependence of the British Economy. By T. BARNA. (With Discussion.) 
Index Numbers of the Real Product of the United Kingdom. By C. F. CARTER. 
The Teaching of Statistics in Schools. A Report of the Council. 

Obituary: G. UpNy YULE, C.B.E., F.R.S. 


Reviews of Books, Statistical and Current Notes, Periodical Returns, Additions to Library. 


SERIES B (METHODOLOGICAL), VOL. XIII, No. 2, 1951 


Some Problems in the Theory of Queues. By Davip G. KENDALL. (With Discussion.) 
The Theory of Position Finding. By H. E. DANiELs. (With Discussion). 

A Review of the Literature of Systematic Sampling. By W. R. BUCKLAND. 

On Two-Stage Sampling. By V. P. GODAMBE. 

On Certain Probability Distributions arising from Points on a Line. By B. V. SUKHATME. 
The Estimation of Standard Error from Successive Finite Differences. By P. G. Guest. 
The Interpretation of Interaction in Contingency Tables. By E. H. Simpson. 

Complex Contingency Tables Treated by the Partition of x*. By H. O. LANCASTER. 

The Variance of Least-Square Estimates under Linear Restraints. By S. ROSENBAUM. 
Change-over Trials. By H. D. PATTERSON. 


A General Technique for the Analysis of Experiments with Incorrectly Treated Plots. 
By P. M. GruNDy. 


Subjective Judgment in Statistical Analysis: An Experimental Study. By D. J. FINNEY. 


The Expression of the Complementary Outputs of Two Products in Terms of a Common 
Unit of Production Effort. By G. H. Jowett. 


Inversions and Rank Correlation Coefficients. By J. DURBIN AND A. STUART. 
Note on Durbin and Stuart's Formula for E(r,). By H. E. DANIELS. 


ROYAL STATISTICAL SOCIETY 
4 Portugal Street, London, W.C. 2 
































Annals of Eugenics 


A JOURNAL OF HUMAN GENETICS 
EDITED BY L. S. PENROSE 


Vol. XVI, Pt. IV May 1952 
Further data on genetics of microcythaemia or thalassaemia minor and Cooley’s disease or thalassaemia 
major. By I. Bianco, G. MONTALENTI, E. SILVESTRONI and M. SINISCALCO. 
Body weight at different ages and heights. By W. F. KEMSLEY. 
Blood groups in Jews from the Yemen. By A. BRZEZINSKI, J. GUREVITCH, D. HERMONI and G. MUNDEL. 
A statistical study of human twinning. By NorRMA MCARTHUR. 
Blood agglutinogens of the Mexicans. By C. ARTEAGA, M.S. MALLEN, A. V. BRozco and ELENA L. UGALDO. 
Simple tests for bimodality and bitangentiality. By J. B. S. HALDANE. 
Birth weight and length of gestation of twins, together with maternal age, parity, and survival. By Mary N. 
KARN. 
Data on the genetics of birth weight. By L. S. PENROSE. 


Subscription price 57s. net per volume of four quarterly parts. Single issues 15s. (postage extra) 


CAMBRIDGE UNIVERSITY PRESS 
BENTLEY HOUSE, 200 EUSTON ROAD, LONDON, N.W. 1 























The Annals of Mathematical Statistics 


Issued Quarterly by the Institute of Mathematical Statistics 


VOL. 23, NO. I CONTENTS MARCH, 1952 


Abraham Wald, 1902-1950. J. WOLFOWITZ. 

The Formative Years of Abraham Wald and His Work in Geometry. KARL MENGER. 

Abraham Wald’s Contributions to Econometrics. G. TINTNER. 

The Publications of Abraham Wald. 

On the Power Function of Tests of Randomness Based on Runs Up and Down. HowarD LEVENS. 
On the Structure of Balanced Incomplete Block Designs. W. S. CONNOR, JR. 

Formulas for the Group Sequential Sampling of Attributes. HOWARD L. JONES. 

An Application of Information Theory to Multivariate Analysis. S. KULLBACK. 


Corrections for Non-normality in the Use of Two-Sample t- and F-Tests at High Significance 
Levels. RALPH ALLAN BRADLEY. 


A Bayes Approach to a Quality Control Model. M. A. GrrRsHICK and HERMAN RUBIN. 
Testing a Straggler Mean in a Two-way Classification Using the Range. JACK MOSHMAN. 
Note on Wilcoxon’s Two-Sample Test when Ties Are Present. J. HEMELRIJK. 
Subscription rate $10.00 per year inside the Western Hemisphere and 
$5.00 per year outside the Western Hemisphere 


INQUIRIES AND SUBSCRIPTION ORDERS SHOULD BE SENT TO 


CARL H. FISCHER, Secretary, THE INSTITUTE OF MATHEMATICAL STATISTICS 
UNIVERSITY OF MICHIGAN, ANN ARBOR, MICHIGAN 











(vi) 





ECONOMETRICA 


JOURNAL OF THE ECONOMETRIC SOCIETY 


Contents for Vol. 20, April 1952, Include: 

A. CHARNES, W. W. CooPER AND B. MELLON: Blending Aviation Gasolines—A Study in Programming 
Interdependent Activities in an Integrated Oil Company. 

A. CHARNES: Optimality and Degeneracy in Linear Programming. 

M. J. FARRELL: Irreversible Demand Functions. 

A. DvoreETzky, J. KIEFER AND J. WOLFOWITZ: The Inventory Problem: 1. Case of Known Distributions of 
Demand. 

MIcHIo MorisHIMA: Consumer's Behavior and Liquidity Preference. 

HERBERT A. SIMON: On the Application of Servomechanism Theory in the Study of Production Control. 

FRANK E. BOTHWELL: The Method of Equivalent Linearization. 

MARTIN SHUBIK: A Business Cycle Model with Organized Labor Considered. 

GERARD Desreu: Definite and Semidefinite Quadratic Forms. 

M. HaTANAKA: Note on Consolidation Within a Leontief System. 

ABRAHAM WALD: On a Relation Between Changes in Demand and Price Changes. 

REPORT OF THE LOUVAIN MEETING. 

REPORT OF THE NEW DELHI AND PATNA MEETINGS. BOOK REVIEWS: 

Published Quarterly Subscription rate available on request 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics. 
Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 
for membership should be addressed to 
WILLIAM B. SIMPSON, Secretary 
THE ECONOMETRIC SOCIETY, THE UNIVERSITY OF CHICAGO 
CHICAGO 37, ILLINOIS, U.S.A. 


SANKHYA 


THE INDIAN JOURNAL OF STATISTICS 
EDITED BY P. C. MAHALANOBIS 


VoL. Il CONTENTS 


The estimation of parameters in certain stochastic processes. By HENRY B. MANN 

Statistical inference applied to classificatory problems. By C. RADHAKRISHNA RAO 

Multivariate binomial and Poisson distributions. By A. S. KRISHNAMOORTHY 

On errors of estimates in various types of double sampling procedure. By K. C. SEAL 

Estimation of parameters from incomplete data with application to design of sample surveys. By ABRAHAM 
MATTHAI 

Confluent hypergeometric function. By PRAN NATH 


A study of recent trend in infantile mortality rates in Calcutta by longitudinal survey. By K. N. Mitra, 
BIMALA BHATTACHARYA, KAMALA Dey, C. S. DAWN, MERCIA OBADIAH AND A. K. GAYEN 


On the non-existence of certain difference sets for incomplete group designs. By S. S. SHRIKHANDE 
On the non-existence of affine resolvable balanced incomplete block designs. By S. S. SHRIKHANDE 
A note on the power of the best critical region for increasing sample size. By D. BAsu 
Some further results on errors in double sampling technique. By CHAMELI BosE 
A note on price-wage variations in cottage and factory economy. By G. C. MANDAL 
Book REVIEWS 

Annual Subscription: 30 rupees 


INQUIRIES AND ORDERS MAY BE ADDRESSED TO 
THE EDITOR OF SANKHYA 
PRESIDENCY COLLEGE, CALCUTTA, INDIA 






































JOURNAL OF THE 
AMERICAN STATISTICAL ASSOCIATION JUNE 1952 
1108 16th Street, N.W., Washington 6, D.C. Vol. 47, No. 258 


Estimation for Sub-Sampling Designs Employing the County as a Primary 
Sampling Unit Emil H. Jebe 


Some Applications of Statistics for Auditing John Neter 


Latent Structure Analysis and its Relation to Factor Analysis 
Bert F. Green, Jr. 
Fertility Trends and Differentials in the U.S. Clyde V. Kiser 


Book Reviews 


The American Statistical Association invites as members all persons 
interested in: 


1. Development of new theory and method. 
2. Improvement of basic statistical data. 
3. Application of statistical methods to practical problems. 














ACCEPTANCE SAMPLING 


A symposium—$1.50 


ACCEPTANCE SAMPLING BY ATTRIBUTES 
Developments Prior to 1941 Paul Peach 
Wartime Developments E. G. Olds 


ACCEPTANCE SAMPLING BY VARIABLES 


Acceptance Sampling by Variables, with Special Reference to the Case in which 


Quality is Measured by Average or Dispersion J. H. Curtiss 
Use of Variables in Acceptance Inspection for Present Defective |W. Allen Wallis 
Chairman's Closure J. W. Tukey 


AMERICAN STATISTICAL ASSOCIATION 
1603 K Street, N.W., Washington 6, D.C. 








—J 




















(viii) 


] 














VoLuME 39, Parts 1 anp 2 APRIL 1952 





MOMENT COEFFICIENTS OF THE k-STATISTICS IN SAMPLES 
FROM A FINITE POPULATION 


By JOHN WISHART 
Statistical Laboratory, University of Cambridge 


1. Tukey (1950) has shown how it is possible to simplify the presentation of the moment 
coefficients of the distribution of certain of the k-statistics in samples from a finite popula- 
tion, illustrating in particular by calculating the variances of k, and k,, and their covariance. 
His method is to introduce a new sample statistic k,,_, and to work out once and for all 
certain non-linear functions of these as linear functions of themselves. This is much of the 
labour of calculating the sampling moment coefficients; the subsequent work consists of 
selecting those that are required and putting them together. 

It should be mentioned that Dressel (1940), following the work of Dwyer (1938), intro- 
duced statistics L,, (r,8,...+1) as unbiased estimates of products A,A,... of the Thiele 
seminvariants. These are the same quantities as Tukey’s k,, although, because of the 
stated limitation to the values of r,s, ..., they do not form a complete set. 

It is of some interest to carry the calculations further and derive new results. In this paper 
the formulae are extended to the 6th order, and all moment coefficients evaluated to this 
order. This will carry us a good way beyond the results of Irwin & Kendall (1944). Finally, 
certain basic formulae are given for the 7th and 8th orders. 


2. Using the notation of David & Kendall (1949) for the symmetric functions of the 
n sample observations, we may write k,, in general as 


i 5 (oe Me- 1)! rist!... [Il (p72 pze...)] (241) 

eae T1{a,!7q!...}  TEE( gy!) (pe!)*2..-} n&) ‘ 
where, for the first suffix of k, namely, r, we have a partition (p71 p32...) of the integer r such 
that &(p7) = r, and having p parts, sc that X(7) = p. There are similar partitions of the 
suffixes s, t, ..., and inside the large summation sign the symbols II and = are used to denote 
multiplications and sums to cover all of 7,s,t,..., while the large outside summation sign 


denotes that we must sum over all partitions. This definition includes Fisher’s k, as particular 
cases. 





To illustrate let us work out kj.. The partitions of 3 are 3, 21 and 1%, and of 2 they are 
2 and 1”. We shall therefore have terms in [32], [31°], [271], [215] and [15]. The first term is 
(—)°?O!.0! 31.2! [3.2] [32] 
~OALAL (312. (21) n+) ~ n@° 

















ae (—)10l.1! 31.2! [3.12] [312] 

This is followed by 11.21 (BE)? n@® ~~ _@ 
(-—)#°1!.0!  3t.2! (21.2) 3[242] 

ss ead Dl...) tt. (2)t ne ~ n° 


The fourth term, however, consists of two parts, namely, 
(—)?+°2!.0! 3!.2! [13.2]  2[215) 


31.1! (113.2! nD ~  2@ 5[213] 
(—PH LL! 31.2! (21.12), 3[213), 
L1!.2! Qt)? n@2 ~  n@ 


Biometrika 39 I 





and 














2 Moment coefficients of the k-statistics in samples from a finite population 
(—)2+221.1! 3!.2! [13.12]  2f15] 
31.2! (1!)%.(1!)? n@® ~— np) * 
With practice such formulae can readily be developed by combinatorial methods. 
Inversely, we can determine the augmented symmetric functions in terms of the k’s, the 
general formula being 





while the last term is 


[re...] 1 i 


nrtst-) a »Y Ii{7,! 71! od } TI{(p,!)™ (pyl)Pa...} Papin factors) pq p3...(mq factors)...° (2-2) 








In the tables below, both sets of formulae are given. To find a k in terms of augmented 
symmetrics read by columns from the top down to and including the diagonal, the unit 
entry in which is shown in bold italics. To find an augmented symmetric in terms of k’s read 
horizontally up to and including the italicized diagonal. 


3. CONVERSION TABLE 




































































lst order k, = [1]/n 
2nd order 
ky ky 
[12]/n'®) a -1 
[2]/n 1 1 
3rd order 
Kin key kg 
[13}/n™ 1 | 2 
[21]/n 1 1 =i 
[3]/n 1 3 1 
4th order 
Kun ken Kee kg, k, 
[14y/n® 1 jade ae Ae -6 
(21?]/n™ 1 z -2 -3 12 
[22]/n'2) 1 2 1 . -3 
[31]/n™ 1 3 I -_" 
[4]/n 1 6 3 4 1 
5th order 
Kuan ker ke ksi ke ky ks 
[15)/n™ I -1 l 2 —2 —6 24 
[213)]/n™ 1 1 —2 -3 5 12 — 60 
[221 ]/n™ 1 2 1 ’ = ¥ 30 
[312]/n® l 3 ; 1 me -4 20 
(32]/n®) 1 4 3 1 1 : —10 
[41]}/n 1 6 3 4 1 -5 
[5]/n 1 10 15 10 10 5 1 









































JOHN WISHART 3 























6th order 
Kunin kena kee Kin Kes Kse1 kan kgs Ks ks, ke 
[1°}/n® 1 —] 1 2 a -2 -—6 4 6 24 | —120 
[21*)/n® 1 1 —2 =§ 3 5 12 —12 | -—18 | —60 360 
[2712}/n) 1 2 1 =¥ | ~§ =§ 9 15 30 | —270 
(319]/n 1 3 d 1 —1 —4 4 4 20 | —120 
[23]/n 1 3 3 1 ; : : io : 30 
(321]/n® 1 4 3 1 ff . -—6 —4 | -10 120 
[417]/n 1 6 3 4 1 : +} —5 30 
[3?]/n‘2) 1 6 9 2 ; 6 1 ; : —10 
[42]/n 1 7 9 4 3 4 1 ; 1 ; —15 
[51]/n® 1 10 15 10 ce 5 ; s 1 —6 
[6]/n 1 15 45 20 15 | 60 15 10 15 | 6 1 






































4. If we define as K,, that same function of the N members of the population as k,y 
is of the n members of the sample, then, as shown by Tukey (1950) (who used ky __): 


Exy(kyw...) _ Ka... (4:1) 


where Fy, denotes the mean value for the finite population. When N becomes infinite 
K,«.,, becomes K,K,x;.... Consider now the problem of determining the sampling moment 
coefficients of the k’s. To fix ideas, suppose we require the third moment coefficient of k, 
about its mean, which we shall call M(13), following Fisher’s notation (Fisher, 1929), but 
using M instead of » to denote that it is the finite population result. Now 


M(1°) = Ey(k, — K,)? 
= Hy(k} — 3k? K, + 3k, K?— K}). 
dv use (4:1) we require k? and k? expressed as linear functions of the k,,__, and the result is 
then simplified by dealing in a similar manner with the resulting non-linear functions of K’s. 
Our next problem, then, is to determine the values of powers and products of powers of the 


k’s. This is done algebraically by using a combination of David & Kendall’s (1949) ‘Tables 
of symmetric functions’ and our tabies of §3. To illustrate let us work out £3. 











(From §3) ky = [2]/n —[12]/n®, 
(From Tables of $.F.) = (2)/n—{(1)?—(2)}/n® 
= {(2)—(1)?/n}/(n— 1). 

Therefore | k2 = {n?(2)? — 2n(2) (1)? + (1)4}/(n)?. 
(From Tables of S.F.) 

[4] [31] [2?] [21?] [14] 

n? . n® - 
—2n —4n —2n —2n 
1 + 3 6 1 
(n—1)? —4(n—1) n?—2n+3 |, —2(n—3) 1 + (n))? 























I-2 











4 Moment coefficients of the k-statistics in samples from a finite population 
(From §3) 








ky ks, Kee key kya 
n—1 4n—4 3n—3 6n—6 n—1 
—4n+4 ‘ —12n+12 —4n+4 
+n ¢ n?—2n+3 2n?—4n+6 n?—2n+3 
—2n?+10n—12 —2n?+10n—12 
n?—5n+6 
=n-1 ; n(n + 1) ; ‘ +n 


2 
= nk (14575) hie. 


A similar formula holds for K? in terms of K,, K,, and N. 

Any such result can be worked out by a combinatorial procedure, using Fisher’s 
rules. The note by Kendall following this paper establishes the validity of the combinatorial 
method. In the present case the patterns with non-zero coefficients are obviously 


2 2)|4 a: tia 
2 

2 2)|4 Jae 
2 2/4 





with numerical coefficients 1, 1 and 2, and n-coefficients n-!, 1 and (n—1)-!. Note that 
a pattern occurs consisting of diagonal entries only; this is a particular case of that in which 
the columns may be divided into two or more classes, each confined to different sets of rows. 
The n-coefficient of such a pattern is the product of the n-coefficients of the sub-patterns 
comprising it. Fisher was able to ignore such patterns because he proceeded straight to 
cumulants. On the other hand, the following patterns, which correspond to the vanishing 
terms in the above algebraic derivation, can be seen from the rules to have zero coefficients: 








21/3 , Tys Fe 34 
1/1 1 l ae e. 
i oe 1/1 1/1 
2 2/4 See 

2 2/4 





In fact if the expression to be evaluated does not contain a k,, then there can be no k,, 
on the right with a unit suffix. 
If we now take the expectation of k3 over the finite population we get 


Bo. he 
Ey(k3) = nhet (1 + cal Ky», 
which for an infinite population becomes 


1 2 
E(k) = p’'(2?) = kat (1 2 ee i) «i 


which is otherwise known to be correct. For Fisher’s result is that 





1 2 
«(2?) = ‘A K, gre Ki, 
in addition to which we know that 


(22) = K(2*) +«2(2) = K(2%) +43. 





al 





JOHN WISHART 5 


We may take advantage of Fisher’s rule for the addition of a k, provided we recognize 
that, in addition to adding 1 to existing suffices in all possible ways, and dividing by n, we 
must also add | as a new suffix without change of coefficient. Thus 


1 
kk, = kat ken (from ky = kg) 


1 2 1 
and ky, kt = akat kat ny hon t ken 
Other results may be derived from the above. Thus from the 2nd order result 
1 
= nat k,, (from k, = k,) 


we have kk, = k(t a) 


2 2 
= 5 kg, — 7@ Kee + ky). 


It should be noticed that Tukey (1950, p. 513) has an error in this result; also that he did 
not attempt to be comprehensive in giving formulae for all combinations of the k,,,__ for the 
4th order. 

The full table of such relations is given in the following section, to the 6th order. They 
are set out in order of calculation and, for each order, a line divides the comparatively small 
number of relations which were worked out directly (these can be developed by com- 
binatorial rules, including the rule for the adding of a k,) from the remainder which 
were derived from those already worked out (including results of lower order). They 
were all carefully checked. 


5. FORMULAE FOR POWERS AND PRODUCTS OF THE k’s 
2nd order 


ki = k,/n+ky 


3rd order 
kk, = ky/n+kyy, 
Ki = k,/n® + 3k [n+ ky, 


ky ky = 2ky/n+ ky. 


4th order 
kk, = k,/n +k, 
k3 = k,/n + (n+ 1) kgo/(n— 1), 
kaki = ky/n? + 2kg/n + kye/n + kay), 
ki = k,/n3 + 4hs)/n? + 3k /n? + 6hyy)/n + ky). 
kay ky = kyy/0 + kyo) + kay), 
ka ky, = 2kg/n — 2kgo/n® + ko1, 
ky ki = 2kg/n? + 2kyo/n? + Sky /n + kyr, 
Ky ky = 3kq)/n + ky, 
ki, = 2kyo/n® + 4ho,/n + ky. 








6 Moment coefficients of the k-statistics in samples from a finite population 


5th order 
kk, = ks/n+kq, 
kgky = ks/n + (n +5) kgo/(m—1), 
keg ki = k,/n* + 2kg,/n + kgq/n + kgy1, 
k3k, = ke|n® + keqy/n + 2(m + 1) kegg/n™ + (m + 1) kgaa/(n — 1), 
ky ki = ke|/n® + 3k4g,/n? + 4kgq/n* + 3kqq1/n + 3kqq,/ + korrr, 
KR = k,/n* + 5kq,/n3 + 10k g9/n? + 10kg,,/n? + L5kg9,/? + 10ke11)/0 + ayy 
keg ky = kegy/n + kegg/n + kgy,, 
kgky, = 2kgy/n —.6kgq/n™ + kyu), 
kogk, = 2kg/n + koa, 
kay ky = ky/n + (n— 3) kgq/n™ + (nm + 1) yg, /(n — 1), 
Keay ki = ky [n? + 3kgq/n* + 2k51,/0 + 3k gq,/n + kerr, 
Kgky ky = 2kg,/n? + 2(n — 3) kgq/(nn™) + 3kg,/n + 2(m — 2) keyg, / + keyy31, 
heyy I = 2kgy/n + 6kgg/n5 + Thgy,/n® + 12k gg, /n? + Pkg /M + kya, 
Keay ky = kegyy/0 + 2kog,/0 + kerr, 
key ky, = Bkgo/n™ + 2kgy,/n + 2(n — 2) kyo, /n® + korn, 
kgkyy, = 3k gy1/n — 6kegg,/n™ + koyy1, 
heyy hE = 3kgy/n® + 6hegg,/n? + Thy 3/2 + kyyis, j 
ki, ky = 4kgo/(nn™) + 4h g),/n? + 2(5n — 4) kggs/(nm™) + 8koy11/0 + Kyra, 
Raya ky = 4h /n+ kya, 
Kya ky, = 6h g91/n™ + 6koi1,/2 + kuin. 
6th order 
kk, = ke/n+ks,, 
kgk, = keln + (m+ 7) kygo/(n — 1) + 6K9/(n — 1), 
k§ = ken + Dk go/(m — 1) + (m + 8) kegg/(m — 1) + 6nkggo/(n — 1), 
kk? = ke[n? + 2k, /n + kgg/n + kegs, 
kgkak, = ke/n? + kg,/n + (n + 5) kyg/n™ + (n + 5) kegg/n™ + (0 + 5) ego, /(n — 1), 
kg = kg/n? + 3(n + 3) kgg/n™ + 4n(n — 2) kegg/(n)? + (m+ 1) (n+ 8) kgoo/(n — 1), 
keg ki = ke/n® + 3k,/n® + 3kqq/n? + kegg/n? + 3kq11/0 + 3kgq1/0 + kyr, 
KEKE = ke/n? + 2k,,/n? + (3n + 1) kgo/(nn™) + 2(m + 1) kegg/(nm™) + kegy,/n + 4(m + 1) egg, /n™ 
+ (+ 1) kggg/n™ + (n+ 1) kgg11/(m— 1), 
ka kt = k/n* + 4ks,/n3 + Thq./n? + 4kgq/n3 + 6ky1,/n? + 16 kyo /n® + Skgo9/n? + 4kg11,/0 
+ 6hoo11/0 + kerri, 
KS = ke/n5 + 6k, /n* + 15kg./n4 + 10kg5/n* + 15k4,,/n? + 60kgo,/n®* + 15k g99/n3 
+ 20kg111/n? + 45kg911/0* + 15ko 31)/2 + Ayaan: 
kg ky = kyy/n + kgg/n + keg, 
kgky, = 2kg,/n — 8kqg/n™ — 6kg/n™ + kay, 
Keggk, = kyg/n + kegg/n + kegas, 
egy kg = kg,/n — 2k qg/n™ + kegg/n + (m + 5) kggs/(n — 1), 
kegkyy = ks,/n + (n — 4) kygq/n' — 3kgg/n™ + (n + 5) keyg /(m — 1) — 6nkgaq/n™, 
kegg kg = 2kga/n — 2kgg/n™ + (n + 3) kgga/(n — 1), 














JOHN WISHART 7 


keg ki = kgy|n? + 2kgo/n? + kgg/n? + 2hgy,/” + Fkgqy/n + key, 
kkk, = 2ks,/n® + 2(n — 4) kyo/(nn™) — 6hegg/(nn™) + 3kq31/n + 2(n — 4) keyg, /n™ + kegyyy, 
kag k= = 2ky/n? + 2kgg/n? + 4hgo,/n + kooe/n + koorr, 
keg kak, = kz,/n? + 2(m — 2) kyo/(nn™) + (nm — 3) kgg/(nn™) + kqyy/% + (3n — 1) kg, /n 
+ (m+ 1) kggq/n + (7 + 1) koo11/(m— 1), 
Lf deyy = ksy/n®— Shag (nm™) + 2(n®— 20+ B) bal (0)? + kegyy/m-+ (+ 1) Rg 
— 4n(n + 1) kyo9/(n)* + (nm + 1) ky4)/(n — 1), 
ko KY = kegy|/n? + 4hgo/n? + 3kgg/n? + 3kqy,/n? + 13k g9,/n? + 3k oq9/n? + 3kg11,/0 
+ 6914/0 + keri, 
ky ky, ki = 2k, /n3 + 4(n — 2) kyo/(n?2n™) + 2(n — 3) kgg/(n2n™) + 5k,,,/n? + 4(3n — 5) kyo, /(nn™) 
+ 2(n — 2) Kao0/(nn) + 4h51,/n + (5m — 7) kyo /n™ + kori, 
ky Kf = 2ks,/n* + 8kyo/n* + 6kgg/n* + Vk 4y,/> + 44k g0)/n? + 12kg99/n* + 16kg11/n* 
+ 39kg91,/n? + 14koy iy /2 + Kyi, 
Reg hy = Kyu /n + 2kggy/n + kei, 
keg ky = 2k yon + 2kgys/n + 2(n — 4) egg, /M™ + kairr, 
Koo ky = 2kyq,/n + kyoe/2 + keors, 
Keg ky, = 2kgg/n + 4hgo,/n — 4hy99/n™ + hoor, 
keg ky, = 3k4y,/n — 18k g9,/n + 12kyo9/n + kyr, 
Keay kg = Kqyy/n + 2(n — 3) keggy/n™ — 2kygg/n™ + (n+ 1) koo13/(n — 1), 
Ky = keqo|n + kegg/n + hogy, /n + 2(m — 3) keggy/n™ + (n? — 10 + 4) kggg/n™ 
+ (m+ 1) kgg14/(n— 1), 
Keay hi = Keigyy/n? + 6kegg,/n? + Wk gog/n? + Bhgy 34/2 + 5koeq3/% + keris, 
Kigy ky key = Begg/(nn™) + WAkga/(nn™) + 2kgy/n* + 2(5n — 6) kgq,/(nn™) + 2(n — 2) kyge/ (nn) 
+ 3hgy1,/0 + (5 — 7) keggy/2™ + kerri, 
keg kyyy ky = 3kqy,/n? + 6(n — 3) kegoy/(nn) — 6heyo9/(nn™) + 4k5y1,/n + 3(n — 3) kooq,/n™ + keoyra1, 
kik}, = 4kqo/(nn™) — 4ks5/(n™)? + 4k41,/n? + 8(n — 3) kg, /(nn™) + 2(n? — n + 4) kege/(n)? 
+ 4hg11;/n + 4(n — 2) kgo1,/2 + kerri, 
Heyyy ky = 3k gy)/n5 + 18kgq)/n* + Bhegg/n> + 10kg1 11/0? + 27k 9044/0? + L2keq n/N + kya 
ki ky = Ahgo/(n®n®) + 4hgs/(n?n™) + 4h 4,)/n? + 8(4n — 3) kgq/(n?n™) + 2(5n — 4) Kgge/(n?n) 
+ 12k 11,/n? + 2(17n — 16) kyoqy/(nm™) + 13hei114/0 + yaaa, 
Koray ky = begyyy/0 + 3hg911/0 + kerr, 
egy yy = 4h gq)/n™ + 2kygg/m™ + 2kgy1,/m + 2(2n — 3) koa /2™ + kerr 
egy heyy, = Skgq,/n — 6hgo/n™ + 3kgy1,/n + 3(m — 3) kor + koriar, 
Reg heyyy, = 4h 3311/2 — 12k 9914/2 + karin, 
Keyyyy RY = 4h g1 1/2? + 12k 914/02? + 9k /M + Kyun, 
Ry ky ky = 12k91/(nn™) + 6hygo/(nn™) + 6k11,/n? + 6(4n — 3) kooy,/(nn™) 
+ Wey /2 + kya 
Ki, = 4hgg/(n™)? + 24kg9,/(nn™) + 8(n — 2) kygg/(n™)? + 8kqq33/n* 
+ 6(5n — 4) kyoy3/(nm™) + 12k /2+ Kyi 
Raya ky = Shey /0 + kya, 
Ray ky, = 12k 99/2 + Ske /t+ ky 
Ki; = Okggq/n®™ + 18k_9),/n + Mori /M + ky yir11- 


ll 





8 Moment coefficients of the k-statistics in samples from a finite population 


§. As already indicated, operating with Ey on the first sets of formulae in §5 will give the 
appropriate moment coefficients about the origin, and with HZ the moment coefficients 
about the origin for the infinite population case, corresponding to Fisher’s cumulants. In 
proceeding to find the moment coefficients about the mean, in the finite population case, 
much work is saved by noting that any K having a unit suffix can be dropped at the final 


stage. Thus 
M(18) = Ey(k,— K,) 


= Ey(kq) — 3K, By(kj) + 2KY 
= K,/n? —3K,/(nN) + 2K,/N? 


-(e-x) Gi-x) * 


= a(a— x} K, 
if ae Ae A 
if weayrite a = —— 7. 
Again, 
M (21%) = Ey{(ke — K,) (ky - K,)°} 
= Byk,(k} 31K, + 3k, K}— KY) — K, M(19), 
the first part of which is 


Ey(kyki) — 3K, Ey(ky ki) + 3K} Ey(kak,) — K, K? 


1 4 1 2 : 1 7 
7 7a hst oa 3K(GKe+ hat ~ Koa + Kon) + 3Ky (, K,+ Ky) — K, K} 
1 4 i. ise 6 6 3 3 9 1 4 
= ahs pa Moa pay Be— gy Mea yy Moet yaks + py Meet ya Mee ya Xe— ya Mon 


ee Dy 1 1\/4 5 
= (5-x) e+ (3-H) (gx) 


ll 
R 
wo 
a 
-f 
— 
R 
aes 
R 
| 
| 
= 
oa 
aw 
R 
= 
e 


It follows that M(21) = a®K,—a (« - x) (Ky Ky — Kg) + 3a*Kop0. 
The last term in this formula has obvious associations with its infinite population analogue 
3k3K,/n*, which is the term 3«(21)«(1?) which has to be subtracted from y(21°) to obtain 
«(213). We are not concerned here with the cumulants of the multivariate distribution in 
the finite population case, but it should perhaps be mentioned that if we accepted Tukey’s 
concept of an infinite population of samples of size x from a population of size N (Tukey, 
1950, pp. 515-16) we might define a cumulant K(21%) which would be obtained from 
M(21%) by subtracting 3M(21) M(1?), i.e. 3a2K,.K,, a term which could be combined with 
the last term in the above formula. But this question really needs further consideration. 
The final results of our operations are given in the next section. The simpler of the general 
formulae are already known; for example, Irwin & Kendall (1944) gave M(r) = K, and 
M(rl) = aK,,,. They also gave the equivalent, in the special case s = 2, of our formula 
for M(rs). 











oe 








JOHN WISHART 9 


By way of comparison it may be mentioned that Sukhatme (1944) worked out the moment 
coefficients about the origin of the simultaneous distribution of the m,, the sample estimates 
of the finite population moment coefficients M,. Skellam’s (1949) somewhat simpler results 
dealt similarly with the simultaneous distribution of the sample sums of powers (equivalent 
to estimates of the M}). 


7. RESULTS 


The formulae listed below for general r and s have been verified as far as the 6th order and 
may well be true in general. Three further expressions of the 6th order have also been 
worked out. Note that in addition to writing a for n~!—N- we use (rst...) to denote 
Fisher’s «(rst ...) in which a term such as k,,k,... is written K,, 


M(r)=K,, M(rl) = cK,,,, 
M (rs) = (rs) ie (K,K, ‘i K,s); 
M(rl?) = aK, .9—a(K, Ky — K,2), 
M(rl*) = aK,,.3—a(a— N-) (K, Ky — K,5) + 30°K,.1.9, 
M(r21) = na{(r21)} e a( Ks K, ‘% Krai.e + K, Ky 2 K,3); 
M(r1*) = aK 14 ee na(as i N-) (K, Kg my K,4) a 3a?(K, Koo a K, 22) ¥ 4a? (a re N-') Ky1.3 
+ 6a°K,.2 9, 
in which the last three terms become the cumulant correction 
4x«(r1) K(15) + 6«(r1?) K(1?) = 4k,,) Ky/n3 + 6K, 9K _/n3 


in the limit as N oo. 

The remaining three particular formulae of the 6th order are listed below. It will be 
seen that a general pattern for M(1”) is emerging; the others are suggestive of what the 
general pattern is going to be (see also M(2*) in $8): 


M(1®) = na(ad + N-5) Ky + Lina?(a3 + N-3) Kyo + 10x?( — N-1)? Kgg + Lia? Koo, 
in which the last three terms become the cumulant correction in the limit, 


3 ee _rK thie a 
M (2°) = (28) — Z (KK, — Ky) — 3” = (KooKy — Kogo) + 2(K3— Koo9), 
M(2212) = n2a?{(2212)} — 2a2( Ky Ky — Kyy) — 20t( Kop Ky — Kony) + 0(K3 — Kooo) 
+ a{Ky9/n + 2Ky99/(n — 1)} + 2a? Kgg, 


in which the last two terms become in the limit the cumulant correction «(2?) «(1?) + 2«?(21). 

On proceeding to the limit as Noo, subtracting where appropriate the cumulant 
correction, we see that only the first terms in all these formulae survive, and these are 
readily identified with the corresponding infinite population cumulants. 


8. HIGHER ORDER FORMULAE FOR POWERS AND PRODUCTS OF THE k’s 


It would take a great deal of space to write down all the formulae of §5 for even the 7th 
and 8th orders. But, as was indicated earlier, it is only necessary to work out a small number 
of these directly, the remainder being then determinable by algebraic substitution. Thus, 
for the 7th order, there are only 14 fundamental formulae, and, for the 8th order, 21. Many 








10 Moment coefficients of the k-statistics in samples from a finite population 


of these contain k, or a power of k, so that they are readily determinable from lower order 
results, as previously shown. This method disposes of 11 out of the 14 fundamental formulae 
of the 7th order, and of 15 out of the 21 of the 8th order. 6 out of the 11, and 7 out of the 
15, are of the type k, k{ and can be written down at once from the appropriate rows of David 
& Kendall’s (1949) ‘Tables of symmetric functions’. 

The remaining formulae, namely, 3 of the 7th order and 6 of the 8th order, have been 
worked out by combinatorial methods, and checked by deriving the appropriate moment 
coefficients about the origin from Fisher’s (1929) table of formulae for the cumulants of 
combinations of k-statistics in the infinite population case. For example, to determine 
k3k3 we start with «(32?), and then, noting that 


ae 6 -2 
[42 = Kg + Kyo Ko2 + 21) Kor + Kio Xb1> 


add to it the terms x(3) «(2?), 2«(32) «(2) and x(3) x(2), all of which are given. Finally, the 
combinations of «’s in the answer are replaced by the appropriate k,,,_. 

The fundamental 14+ 21 formulae only are listed below for the benefit of those who 
would like to extend the results of §7. 


7th order 
kek, = k,|n +key, 
kisky = k,/n + (n+ 9) ksg/(n — 1) + 20k4g/(n — 1), 
hig = kq/n + 12ksq/(n — 1) + (m+ 29) keqa|(n — 1) + 36 nkggo/(n — 1)®, 
kk? = k,n? + 2kg,/n + kso/n + ksi, 
kykgk, = k,/n? + kg,/n + (n +7) kgg/n + (m + 19) kyg/n + (0 + 7) gg, /(m — 1) + 6hegg1/(n — 1), 
kjk, = k,/n® + ke,/n + Dksq/n™ + (2n + 25) kyg/n™ + Dkego,/(m — 1) + (n+ 8) kggy/(m — 1) 
+ 18ks99/(n — 1) + 6k y991/(m — 1), 
ky k = k,/n? + 2(n + 7) kgo/n™ + (n? + 22n — 35) keygg/{(n — 1) n®} 
+ (m+ 5) (n+ 7) kggo/(n — 1)?, 
ky ky = k,|n° + 3kq,/n® + 3k5q/n® + kqg/n® + 3k 11/0 + Bkqgi/n + kyr, 
ky kk? = k,|/n3 + 2kg,/n? + 2(n + 2) kso/(nn™) + 3(n + 5) kyg/(nn™) + ks,,/n 
+ 2(m +5) kyo,/n + 2(n + 5) kgg,/n + (n + 5) kegog/n™ + (n + 5) kgoy,/(n— 1), 
3k, = k,/n3 + kg, /n® + 3(n + 3) kgo/(nn™) + (3n? + 14n — 25) kyg/(n)? + 3(m + 3) kyo /n® 
+ 4(n — 2) kggy/{(n — 1) n} + 3(m + 1) (n+ 3) kgog/{(n — 1) n} 
+ (m+ 1) (+3) koo9,/(n— 1)?, 
keg kt = k,/n' + 4ke,/n3 + 6kgo/n3 + Skyg/n? + 6ky1,/n® + 12kgy,/n® + 4kgg,/n® + 3ky9,/n* 
+ 4hegyy,/0 + Bkegoy,/0 + kari, 
kg ky = k,/n' + 3kg,/n3 + (Sn — 1) ksg/(n?n™) + (7m + 5) kgg/(n?n™) + 3k,,,/n? 
+ 3(3n + 1) kyos/(nn™) + 6(m + 1) kggy/(nn™) + 7( + 1) kgoo/(nn™) + kqry/n 
+ 6(n + 1) kgoy,/2 + 3( + 1) kggo,/n™ + (nm + 1) keoyay/(n— 1), 
kak? = k,/n5 + 5k,,/n* + l1kso/n* + 15k4,/n* + 10k5,,/n? + 35kq9,/n® + 20k 551/03 + 25kgo0/n* 
+ 10k 4334/2? + 40kgo11/n? + Lk ggg, /? + Skegyyy1/ + 10kg9114/% + koran 
ki = k,|/n® + The /n5 + 21ks9/n® + 35k4,/n® + 21k,,,/n* + 105kgo,/n4* + 70kgg,/n* + 105ks90/n4 
+ 85k 411,/n* + 210k 9911/0 + LO5Ky991/N* + 35k) 31)/N? + 105k99 11/0” 
+ Whoa /M+ Ayaan: : 








n* 





JOHN WISHART ll 


8th order 
kk, = k/n+ky, 
keke = kgln + (m+ 11) kgg/(n — 1) + 30K 55/(n — 1) + 20K44/(n — 1), 
kik = ky/n + 15kg9/(n — 1) + (n + 44) ksg/(m — 1) + 30K g4/(n — 1) + 60K gy9/(n — 1) 
+ 90nkg39/(n — 1), 
k2 = kg/n + 16kgo/(n — 1) + 48k55/(% — 1) + (n + 33) kegg/(m — 1) + 72K gy9/(n — 1)” 
+ 144nk339/(n — 1) + 24n(n + 1) keo99/(n — 1), 
kk? = ky|n? + 2ky,/n + kgo/n + ker, 
kskyk, = ky[n? + kq/n + (n +9) kgo/n™ + (n + 29) kgg/n® + 20kgq/n™ + (n + 9) ksoy/(m — 1) 
+ 20k4s,/(n — 1), 
kgkgky = ky/n? + ky/n + 12K gon + (m+ 41) kigg/n™ + (n+ 29) kigy/n™ + 12k, /(n — 1) 
+ (n+ 29) kegg,/(m — 1) + 36K 4gy9/(m — 1)® + 72kggo/(m — 1) + 36nkga9,/(n — 1)®, 
kk = ky/n® + 2(n + 9) kgo/n + 8(5n — 7) keg/{(m — 1) n®} + (n? + 26n — 39) kyg/{(n — 1) n®} 
+(n+7) (n+ 9) Kgoo/(m — 1)? + 12(m + 9) kggo/(n — 1)?, 
KZ key = kq/n? + (n+ 20) Kgg/n® + 2(n? + 22n — 32) kigy/{(n — 1) n} + 9(3n — 5) Kqg/{(n — 1) n® 
+ 9(n? + 9n — 20) kggo/{(m — 1)? (n — 2)} 
+ (n? + 17n? + 104m — 320) kggq/{(m — 1)? (nm — 2)} + 6n(n + 5) kgoo0/{(m — 1)* (mn — 2)}, 
kek} = ky/n? + 3k,,/n? + 3kgq/n? + kgg/n? + 3kg11/n + Bk59)/2 + kar, 
kykk? = ky/n? + 2k.,/n* + 2(n + 3) kgo/(nn™) + 2(n + 13) kg/(mn™) + (nm + 19) iegq/ (nn) 
+ kgy,/2 + 2( + 7) ks, /n™ + 2( + 19) kygg,/n® + (0 + 7) Kgog/n® + 6kggo/n™ 
+ (2 +7) Kgoyy/(m — 1) + 6hg314/(n — 1), 
kik? = kg/n? + 2k, /n? + (n + 8) kgo/(nn™) + 2(n + 17) kgg/(nn™) + (2n + 25) ky! (nn) 
+ kgy,/n + 18ks59,/n™ + 2(2n + 25) kgs, /n + Dkgoo/(m — 1) + (n? + 6n + 20) kgyo/n™ 
+ Deg /(m— 1) + (m+ 8) keggyy/(m — 1) + 36K gg9)/( — 1) + Bheoaa9/(m — 1) 
+ 6k go011/(m — 1), 
kg kik, = kg/n? + ky/n? + 2(n + 7) kgo/ (nn) + 4(n? + 14n — 21) kgg/(n)? 
+ (n? + 22n — 35) kyq/(n™)? + 2(n + 7) kego,/n™ + (n? + 22n — 35) kegs, /{(n — 1) n®} 
+ (m+ 5) (+7) kgoo/{(m — 1) n} + 2(n + 5) (n +7) Kggo/{(m — 1) n} 
+ (n+ 5) (n+ 7) kgo01/(m — 1)?, 
Kf = kgln® + 4(r0 + 5) kga/(nm) + 32(n— 2) keg|(n™)? 
+ (3n3 + 23n? — 63n + 45) kyy/{(m — 1) (n)?} + 6( + 3) (2 + 5) Kygoo/{(n — 1) n} 
+ 16(n — 2) (m+ 5) hggq/{(m — 1)? n} + (m+ 1) (m+ 8) (m+ 5) kooa9/(n — 1)°, 
kykf = ky/n* + 4k,,/n? + 6hey/n* + 4ksg/n3 + kqq/n® + 6hg1,/n? + 12ksp,/n® + 4kgg,/n® 
+ Bk goo/n* + 4h533)/0 + 6hyoy/2 + kaa, 
kkk} = k,/n* + 3k,,/n3 + 2(2n + 1) kgo/(n?n™) + (5n + 19) kgg/(n?n™) + 3(m + 5) keyg! (n?n) 
+ Bkgy,/n* + 6(n + 2) kso,/(n™) + 9(n + 5) Kggy/(nm™) + 3(m + 5) kgoo/(nn™) 
+ 4(n + 5) keggo/(nn™) + kgyy,/n + 3(m + 5) Kyoys/n™ + 3(m + 5) heyyy /n™ 
+ 3(n + 5) kggo)/n® + (n + 5) kgor14/(n — 1), 
KBK2 = kg/m + 2ky,/n? + 4(n + 2) Kegg/(n2n®) + 2(3n? + 10n— 17) kea/{n(n)2} 
+ (3n? + 14 — 25) kyq/{n(n)?} + kgy1/n? + 6(n + 3) ke, /(nn™) 
+ 2(3n? + 14n — 25) kyg,/(n™)? + 6(n + 3) kgoo/{(m — 1) n®} 
+ 2(3n? + 14n + 5) kggo/(n™)? + 3(m + 3) kegos,/n + 4(m — 2) kegyy,/{(m — 1) n} 
+ 6(n + 1) (n +3) kgoo1/{(m — 1) n} + (n + 1) (m+ 3) keaa9/{(m — 1) 0} 
+ (m+ 1) (m+ 3) keo011/(n — 1)?, ’ 








12 Moment coefficients of the k-statistics in samples from a finite population 


kk} = k,/n® + 5ky,/n4 + 1LOkgo/n* + L1k5g/n4 + Skqy/n4 + 10k,,,/n? + 30k 59,/n3 + 25k45,/n3 
+ L5kgg9/n? + 10k 39/0? + 10k534,/n? + 30K q913/n? + LOK 451) /0? + 15k g99)/0? + 5k gy / 
+ 10kg9113/2 + karan, 

kg kt = ky/n® + 4kq,/n4 + 4(2m — 1) kgg/(n8n™) + 4(3n + 1) kgg/(n8n™) + (70 + 5) keyg] (nn) 
+ 6ke,/n* + 4(5n— 1) kso,/(n?n™) + 4(72 + 5) kgs, /(n?2n™) + 2(82 + 5) kggo/(n?n™) 
+ 20(n + 1) kggq/(n?n) + 4kesyy)/n® + 6(3n + 1) kqoy1/(nm™) + 12(n + 1) kggii/(nn™) 
+ 28(n + 1) kgo0;/(nn™) + 3(m + 1) Kyo00/(mm™) + kqyyyy/m + 8(m + 1) kgo1)/2™ 
+ 6(m + 1) Ky914/n™ + (m + 1) keoii11/(m— 1), 

ky k& = k,/n® + 6k,,/n> + 16key/n* + 26k54/n® + 15kyq/n® + 15k¢,,/n4 + 66k5o,/n4 + 90k,,,/n4 
+ 60k4o9/n* + T0kg39/n4 + 20k5114/0 + 105k 4211/n* + 60kg3),/n> + 150K599,/n* 
+ 15kgo99/0* + 15k 43311/0? + 80k g9414/N? + 45h 99911/0? + 6hgi 3111/2 
+ 15k o91111/% + Feria, 

K8 = keg|n? + Sky, |n® + 28kqq/n® + 56Ksq/n® + 35kq4/n® + 28kqq,/n® + 168kgp,/° 

+ 280kq5,/n5 + 210k yo9/n* + 280k 550/n® + 56k53,,/n* + 420K 4go),/n* + 280K5;,/n4 
+ 840k5901/4 + 105k g.00/* + TOK gi 113/0* + 560k 5911,/23 + 420k 99011/n" 
+ 56 hay 3113/2? + 210k g94113/2? + 28h i/” + Mya: 


As an example of what can be done with the aid of the above formulae we work out below 
the fourth moment of k;. 


M(2*) a Ey(k,— K,)4 
= Ey (ks Hig Ey(k3) + 6K3 Ey (k3) — 3K 


6 60 
+ (2°) + = Mi Skat (c taj) KE qo 


16(n—2 12 44 
+ = S43) x ait aa Hed wanna) Kee 


n(n —1)? saz t n—1 (n 
a. 12(n +3) 16(n — 2) 
ate nin = 1) A nin TP 


_ 4(n +1) (n+3) 





6(n + 1) 


Koo Ke +°K,Ki+ 1 K,, K3 —3K$. 


(n—1) 
With some rearrangement of terms this becomes 
4 . 12(n+3 
M(2*) = (24) ie n2 (K,K, - Kg) — iy i (Ky. Ky — K422) 


_ 16(n—2) 
te 1)? 


“(Ky K}— Kyo.) + — 


4(n +1) (n+3) 
(n—1) 
(Ky. K3 — Ky292) 


—_—_—. (Ks K,- — K332) — (Koo0K, — Ky202) 


See) 





Ky 4K, 4K 
= gaia 422 2222 
3(K2- Ke) +3(58 2 Tiin— +) 


It will be noted that the sets of n-coefficients which are multiplied by —4, +6 and —3 
respectively, up to the penultimate term, are those occurring in 3, k3 and k,. The last term, 
in the limit as N +00, becomes the cumulant correction 3x?(2?). 


My thanks are due to Mr D. A. East, who checked all the results of $5. 

















; JOHN WISHART 13 


REFERENCES 


|/n Davin, F. N. & KENDALL, M. G. (1949). Biometrika, 36, 431. 

, DressEL, P. L. (1940). Ann. Math. Statist. 11, 33 (44). 

[ Dwy ER, P. S. (1938). Ann. Math. Statist. 9, 1, 97. 

FisHER, R. A. (1929).. Proc. Lond. Math. Soc. 30, 199. 

Irwin, J. O. & KENDALL, M. G. (1944). Ann. Eugen., Lond., 12, 138. 
SKELLAM, J. G. (1949). J. R. Statist. Soc. B, 11, 291. 

SuKHATME, P. V. (1944). Sankhya, 6, 363. 

Tukey, J. W. (1950). J. Amer. Statist. Ass. 45, 501. 





slow 


-3 
rm, 











[ 14 ] 


MOMENT-STATISTICS IN SAMPLES FROM 
A FINITE POPULATION 
By M. G. KENDALL 
Division of Research Techniques, London School of Economics 
1. The problem of deriving the moment-statistics of sample moment-statistics is, in 


essence, extremely simple. With the usual notation for an augmented symmetric function 
of order w we have, for samples from an infinite population, 


El py'pe ... pyt] = "My, )™ (Mp) «+ (Mp) (1) 
and for samples from a finite population 
El py'p3* -.- P3"\n = (PT'pe --- ps ]yn/N™, (2) 


where the subscripts n and N in equation (2) mean that the symmetric functions relate 
respectively to the sample of n and the population of N members. 

These two formulae contain all that is necessary to deal with the univariate case and can 
easily be extended to the multivariate case. The reasons that, in practice, this branch of the 
subject becomes algebraically complex are 

(a) that the symmetric-function statistics with which we are usually concerned are not 
given in terms of augmented symmetrics and therefore have to be converted into them; and 
likewise the parent augmented symmetrics have to be converted back to the types we 
usually require; 

(6) that powers and products of statistics, even those composed of augmented symmetrics, 
are not augmented symmetrics and have to be converted into them; 

(c) that we may require the answer in terms of particular kinds of functions, particularly 
the cumulants, to deal more easily with special populations (e.g. the normal) or cases where 
the cumulants after a certain point can be neglected (e.g. the Edgeworth series); 

(d) that we may require for conciseness and concinnity to use particular kinds of statistics 
and to express our results in terms of particular kinds of parameters. 

2. Difficulties arising from (a) and (b) are now removed by the tables of symmetric 
functions. Difficulties under (c) were to some extent resolved by Fisher’s combinatorial 
method of obtaining sampling cumulants, which is essentially a powerful short-cut. Since 
the publication of the paper by Irwin & Kendall (1944) it has been known how to derive 

the formulae for finite populations from those for infinite populations, this again being 
essentially a short-cut method of circumventing the straightforward use of equation (2). 
But hitherto no combinatorial method has been given for use in the finite case, and as 
Dr Wishart points out in the foregoing paper, the key lies in the generalized k-statistics 
suggested by Tukey (1950). The purpose of this note is to demonstrate the validity of the 
method developed by Wishart. 

3. The generalized k-statistic k,,,._ is most easily defined as having for its mean value the 
product of cumulants x, «,«,.... It then follows by the argument used by Irwin & Kendall 
that the mean value in samples from a finite population is K,,,._. 


Consider now the operators 
DP oP 
7 es (p= 1, 2,...), 3 
Ou? (3) 


S, = 
j=1 


Pp 











, in 
tion 











M. G. KENDALL 15 
and the operators 6, (see, for example, Kendall, 1952, p. 275) such that 


6,4, = ru, (r>p)} 
= p! (r = p) (4) 
= 0) (r < p). 


It is known that 6, obliterates every cumulant except x,, i.e. 


8k, = 0 a+) | 


5 
=p! (q=p). “ 


The expectation of a symmetric function after operation by a set of S’s is, if a constant, the 
same as the operation by corresponding 6’s on the expectation. Thus for any combination of 
operators S of total order w, say S = SUS"... 8%, 


we find that 8 Kasains factors) ¢,9,(x, factors) ...¢sas(ys factors) = 0, (6) 


unless p; = q; and x; = 7; for j = 1,...,8, in which case 


S ky, Py... factors) py pq ...(7, factors)... = (Py 1)" ee (Ds !)%8, (7) 


This is a generalization of a result (Kendall, 1940a) for the ordinary k-statistics. 

If now a function of k-statistics on the left, say, is expressed as a linear function of the 
generalized k-statistics on the right, including a term ky», |. (7, tactorsyete» and we operate 
on both sides by S, everything is obliterated on the right except the coefficient of this 
statistic and on the left we have the result of operating on the function by S. The evaluation 
of coefficients is then the same as in the combinatorial method evolved by Fisher. It can be 
demonstrated exactly in the manner given by myself (see Kendall, 19406, 1952, pp. 276-9). 
The only difference is that, as Wishart points out, we cannot use rule 3 (which enjoins 
ignoration of patterns which fall into separate blocks) because we are here evaluating 
moments, not cumulants. 

4. This proof, the simplest I can find, depends on the introduction of parent cumulants, 
just as the Irwin-Kendall method depends notionally on formulae for the infinite population. 
This is a perfectly valid method of reasoning, but we may note that an alternative proof 
could be obtained without introducing the infinite population. Equations (6) and (7), in 
fact, concern only the obliterating properties of the operators S on the k-statistics and could 
be established directly after the manner (Kendall, 1940a) in which I originally proved the 
less general results for ordinary k-statistics. 

5. It may also be noted that the operational methods developed (Kendall, 1940c) for 
proceeding to the multivariate case are applicable here for developing formulae for the 
finite population. It hardly seems worth while displaying the resulting formulae, which 
would have a very limited usefulness in the present stage of development, but the methods 
are ready if they are ever required. 

6. It is also interesting to note that the above methods solve another outstanding 
problem: given a function of moments or cumulants, to write down an unbiased estimator 
of it. For example, the variance of the second moment about the mean is 

(m—1)? (Ky, 2x3 | 


varm, = - om ln Tn—])’ (8) 














16 Moment-statistics in samples from a finite population 


but if we wish to use this to test significance, the expression on the right usually has to be 
estimated from the sample and we require an unbiased estimator. This is written down at 
sight as (n—1)® (ky , 2p) 

n? \n n—1{’ 





and can then be easily converted to ordinary k-statistics or sample-moments by the use of 
Wishart’s tables. In this case 


MN hal 
Ke = n+1 —a ; 
and hence the unbiased estimator of var m, is 
(n—1)? po eal 
Sapte kg+ 2k3). 


Perallel results hold for the case of sampling from a finite population. 


REFERENCES 


Irwin, J. O. & Kenpatt, M. G. (1944). Ann. Eugen., Lond., 12, 138. 

KENDALL, M. G. (1940a). Ann. Eugen., Lond., 10, 106. 

KENDALL, M. G. (19406). Ann. Hugen., Lond., 10, 215. 

KENDALL, M. G. (1940c). Ann. Eugen., Lond., 10, 392. 

KENDALL, M. G. (1952). The Advanced Theory of Statistics, vol. 1, 5th ed. London: Charles Griffin 
and Co. 

Tuxey, J. W. (1950). J. Amer. Statist. Ass. 45, 501. 











i a a 


awe 


in 




















[ 17-] 


SOME EXACT TESTS IN MULTIVARIATE ANALYSIS 


By E. J. WILLIAMS 


Commonwealth Scientific and Industrial Research Organization, Melbourne 


Some exact significance tests for use in discriminant analysis are derived. A general method of 
deriving exact tests where sufficient or quasi-sufficient statistics do not exist is indicated. 


1. INTRODUCTION 


Suppose that a set of measurements of several variates is made on each of a number of 
individuals, belonging to different groups. A problem which frequently arises in the 
analysis of such data is to determine whether a single linear function of the variates can 
adequately discriminate between the groups to which the individuals belong, and to evaluate 
such a function. This problem is often stated as one in regression of a set of dependent 
variates on another set of variates, but we shall use here the terminology of discriminant 
functions; the results are the same whichever formulation of the problem is adopted. 

We begin by recapitulating some of the results in the theory. With p variates, and q+ 1 
groups, the set of population means may be represented as ¢+1 points in p-dimensional 
space. Generally, the points will lie in at most p or q dimensions, whichever is the less. If 
ris in fact the number of dimensions in which the population points lie, then the populations 
may be specified by a set of r variates, so that if and only if the points fall on a straight line 
will a single hypothetical discriminant function specify the populations. In particular, the 
scales and axes in the p dimensions may be so chosen that the p transformed variates are 
uncorrelated and of equal variance. Then, by a further rotation and translation, the axes 
can be chosen so that the sum of the projections of the q+ 1 points on any axis is zero, while 
the sum of squares of the projections of the q+ 1 points on any axis assumes a stationary 
value. The transformed variates corresponding to these new axes are the canonical variates 
of the set (see Hotelling (1936): Hotelling’s formulation is symmetrical in the two sets of 
p and q variates, leading to two sets of canonical variates; in the present formulation, the 
second set of g canonical variates, corresponding to group differences, is implicit in the 
set of p). 

The purposes of discriminant analysis may now be formulated as (i) to determine the 
minimum. number of canonical variates in terms of which the data may be described 
(collinearity, coplanarity, etc., of the groups), and (ii) to estimate them as linear compounds 
of the original variates measured. We assume that the variates have a multivariate normal 
distribution, the same for each group, about the population means characterizing the 
groups. Then the discriminant functions, or canonical variates of the sample, are those 
transformed variates which give stationary values to the ratio of the sum of squares between 
groups to the total sum of squares. The ratios are found as the roots of a matrix equation, 
and the coefficients of the original variates in the corresponding discriminant function as 
the corresponding latent vector. Geometrically, the procedure of determining the sample 
discriminant functions is equivalent to determining the principal axes of a p-dimensional 
ellipsoid. As with the population, so with any sample, the number of latent roots not 
identically zero is min (7, q). 

Biometrika 39 











18 Some exact tests in multivariate analysis 


The sample latent roots, 4,4, ...,9,, are defined as the stationary values of the ratio of 
the sum of squares between groups to the total sum of squares. The discovery of the exact 
simultaneous distribution of these roots, for general values of p and q, on the null hypothesis 
that the groups do not differ (Fisher, 1939; Hsu, 1939; Roy, 1939) opened up the way for the 
development of exact significance tests. Subsequently the non-null distribution has been 
studied by many workers (Bartlett, 1947a,6; Roy, 1942a, b, 1946). Progress with the 
development of exact tests, where by exact we mean independent of the unknown population 
parameter, has, however, been delayed by two circumstances: first, since the methods have 
not been extensively applied, the development of such tests has not been a matter of great 
practical urgency; and secondly, the heavy mathematics entailed in the study of the 
distribution of the latent roots has made the theory somewhat unattractive. 

Below is presented an approach by means of which tests independent of the population 
parameter may be derived. It is hoped to indicate lines along which the theory may 
profitably develop, answering the practical questions with exact tests, and at the same 
time avoiding some of the mathematical difficulties of the usual treatment. It is considered 
in particular that, while the study of the joint distribution of the latent roots, on the null 
hypothesis, is a necessary development from the theoretical point of view, and does lead 
to certain overall tests of significance, the study of the distribution of the individual roots 
(see Nanda, 1948) does not provide results of practical relevance. 


2. GENERAL REMARKS ON TESTS OF SIGNIFICANCE 


For the purposes of significance tests, the data are summarized in the set of p latent roots 
G,, 4g, ...,4,, (when q < p, only q of these roots are not identically zero). If the null hypothesis 
is not true, and the q+ | points representing the group means lie in a space of r dimensions, 
there will be r unknown parameters in the specification of the joint distribution of the 
latent roots. The problem of estimation, and accordingly of making tests of significance 
for the existence of the roots, would be solved if we could (i) find a sufficient set of r statistics 
for the parameters, and thus confine our attention to the r degrees of freedom provided by 
the set, or (ii) find a set of (p—r) functionally independent statistics, distributed inde- 
pendently of the parameters, and confine our attention to the conditional distribution of the 
roots, for fixed values of these statistics, which would have r degrees of freedom and enable 
conditional estimates and significance tests to be made (see Fisher, 1934; Pitman, 19394, 6). 

It is easy to show, from the form of the distribution in the non-null case, that no sufficient 
set of statistics exists. It is more difficult to determine whether or not any set of statistics 
distributed independently of the parameters exists, but it is considered unlikely that they 
do. In the absence of solutions provided by either of these alternatives, we need to develop 
tests framed somewhat differently from those usually considered. 

When only one sample root is not identically zero the relevant tests are (i) the significance 
of.the observed root, and (ii) the concordance of any proposed discriminant function (i.e. 
when q = 1; or multiple regression relationship when p = 1) with the data; or, more generally, 
fiducial limits for the hypothetical relationship. Each is a test of a linear hypothesis, and 
accordingly analysis of variance methods are applicable. Where there is more than one 
sample root, further aspects of the data can be tested: (iii) the significance of departures 
from collinearity, coplanarity, etc., or, in general, the concordance of the group means with 
a space of r dimensions. 











ve 
at 


ion 
ay 


red 
ull 


ots 


ots 
esis 
ons, 
the 
nce 
tics 
l by 
ide- 
‘the 
able 
1,6). 
ient 
stics 
they 
elop 


ance 

(i.e. 
ally, 

and 
- one 
ures 
with 





———ee 


———— ee 





E. J. WILLIAMS 19 


The practical interpretation of the data, when differences among the groups exist, is most 
simple when only one discriminant function is found to account for this variation; however, 
any systematic study of this problem must take into account the more complex cases which 
may arise. We deal here in detail with the simplest cases, of one population parameter and 
p or q = 2, 3, as throwing light on the problems occurring in more complex cases, and give 
also a discussion of the problem for general values of p. 

To indicate the lines of development leading to this approach, it may be pointed out that, 
while the population parameters are of theoretical importance, what are of practical interest 
are the population discriminant functions; and that it may prove simpler, as well as more 
useful, to make significance tests and determine fiducial limits for these functions than for 
the parameters. An analogy may be found in univariate analysis of variance. The sum of 
squares between groups has a non-central x? distribution, the parameter of which may be 
estimated from the data and fiducial limits derived for it; but, in general, it is the group 
means which are of interest. Simultaneous fiducial limits for these may be derived by 
a method indicated by Fisher (1949, §64). 


3. TESTS OF A PROPOSED DISCRIMINANT FUNCTION 


For the theoretical analysis we shall take as our p variates, not the original variates, but the 
set of p sample discriminant functions, denoting them by 2,, 22, ...,2,. Where two or more 
of the sample roots are equal (including zero roots) this definition is ambiguous; but we may 
then choose any set of variates, uncorrelated in the sample, which correspond to this root, 
without affecting the results of the analysis. We take the 2 in decreasing order of magnitude 
of the corresponding 6;; so that 6,>6,>...>0,. Since the scale of the z is arbitrary, we 
choose the scales such that the total sum of squares of each z is unity. The sums of squares 
between groups will then be 0,, 4g, ...,4,- 

We now consider any hypothetical discriminant function y, whose concordance with the 
data we wish to test. The null hypothesis then specifies that 

(i) the population means of the groups fall on a straight line (that is, the differences 
among the groups are represented by a single parameter), and 

(ii) the p-dimensional vector y specifies the direction of this line. 

For the hypothetical discriminant function, the ratio of the sum of squares between 
groups to the total sum of squares, which will be called the discriminant ratio, and denoted 
by 7, lies in the range 6,,, 0,. Since, under the hypothesis to be tested, the ratio 7 corresponds 
to the population discriminant function, will be a sufficient statistic for the population 
parameter, that is, the corresponding ratio for the population (see Bartlett, 1947a). The 
conditional distribution of the 6,, given 7, will therefore be independent of the unknown 
parameter, and this conditional distribution will be used in deriving tests of significance. 

It now remains to derive the joint distribution of 7 and the 6; in repeated sampling. Since 
the conditional distribution, in which the value of the parameter is irrelevant, is ultimately 
required, we may assume that the parameter is zero. In this case the directions represented 
by x1, %q, ..., ¥, in the space of the original variables will be random. The direction represented 
by y will therefore, when referred to the co-ordinate system of the x, be random in repeated 
sampling. We may take the direction ratios of y in this co-ordinate system as independent 
unit normal deviates w,, ws, ...,w,, 80 that 

Dw; x; . 


Y= (Sw) 














20 | Some exact tests in multivariate analysis 


we. 
Then n= ae. 





A practically more useful specification of the hypothetical discriminant function is in terms 
of the p— 1 sample latent roots after elimination of y, which we denote by 


ba Op +s Bp 


~ 
“~ 


\~ 


The p-dimensional ellipsoid > ca. 1 


u 


has squared principal semi-diameters 1 —6,, 1—4,, ...,1—@,. The section by a p—1 space 
normal to the direction specified by (w,, w», ...,w,) is an ellipsoid with squared principal 
semi-diameters - 

1—4,, 1 — do, sees 1—,_. 
Hence the ¢, are given by the equation 


a 
0; 7 Px 

We can now transform from the conditional distribution of the w; given the 6,, to that of 

the p— 1 ¢,, and a new variate 


“ 0, 
8 = Sw. 
[1 (4.— du) 


y ; . » &t—y) 
We readily find ee See 
'~ (1=4) 1G-4) 
jt 


The result may be verified in the following way: 


11 (6-4) 
re. « ae 
2 A= 211,68) 
j#t 


is a multilinear symmetric function of the p—1 variates ¢,; and on giving the ¢, in 


succession the p sets of values 
ce aoe & 


we find that the value of the expression is unity in each case; hence ZA; is identically unity. 
Likewise 
| TO- $e) 
B= 5 aa 
i i 6; —Pn Il (9; a 4;) 
j+i 





is a multilinear symmetric function of the p—2 variates ¢, excluding ¢,. On giving these 
¢,, the sets of values s ® « me 


we find that 2B; is identically zero. 





——SS_ 





2 er ere 





E. J. WILLIAMS 























21 
8(1—9) 
Now w? = a =, Ae 
as 
so that Swi - “a 2 : 7) 
w(1—8;) 
% ei. l-y 
= 8 A,;=s8, 
ce ters i 
al while >> aoe se 
= 8(1—4) > B; 
= 0, 
O(w, Wg, -- -, W,,) wu 1 Ug -w,, il (9; —9;) Alt (pi- Px) 
a ———__*—. = cont. 2p | bts 
(D1, ba) + “by. 18) * I IT ( 6,- — d,) | 
of aL (1 —0,)Ke- mF. Hy (u- 90) 
= givp- 
| ee Th 8 tT (6,- Bay 





Clearly, s is distributed independently of the other variates, so that the joint distribution of 


the ¢,, for given 0,, i 
asia te (1 -4,-) TH (a-9n)| 


dd,.. 
T1(1— $4) | T10;— 94) |! he 
The joint distribution of the 0; is known to be 


const. 








const. [] 644-?—» [] (1 —0,)Ke-e-a-D 





i- 9;)| II dé,, 
in so that, finally, the joint distribution of the 0; and ¢, is 


11 6k- PDT] (1 —0,)Mn-a- *l 1 HI (0;—9;) Als (o,- ve aida 
const. Wd oa yi TL .-b,) ( 
| ; 


It remains now to derive the conditional distribution, given 9, which will be independent 
of the population parameter. Since the set of 0; and ¢, define 9 precisely, one of them must 


be replaced by 7; but to preserve symmetry we shall not carry out this transformation 
ty. The distribution of 7 is 








const. 9#@-2)( 1 — »)K"—-9-2) dy, 
so that the conditional distribution of the 0; and ¢,, given 4, is 


Il Gha-p- DT] (1 —6,)#"-4-®) 
‘ i+j 


const. — 





esc 


Te Px)| IT 46; 11 d¢, 
BY — 9 MOB TT (1 — Gy)” | TO; = Ba) [Fy 
(1-9) 1 -¢) = 1-4), 





or, since 


II gia-p-)) 
const. ha ( l- 





— Px) | Il 49; 11 d¢, 
t 1G ,.) |bdy 


gp, hr p-q-2) 

















22 Some exact tests in multivariate analysis 


When p exceeds q, only q of the roots are distributed, the remaining p—q roots being 
identically zero. It is desirable in such cases explicitly to modify the distribution given 


above, puttin 
Oo+1 a Patt = Oa+2 rose. Pp-1 fs 6, = 0. 
Then, for the remaining roots, the joint distribution is found to be 


TBE yu pe all 8%) HO $0) | Tae TT a 
cons 7 >iK@—2) IT ( — Py .) | TT = x) |#dy 





In particular, when ¢ = 2, we have 


const. (6; $2)! [(1—g,) (1 — pq) -r-9 = a P= Pad Path Pe, 


v{-T1 (@:—¢)} dy 


These conditional distributions, being independent of the parameter, now provide the 
basis for any exact tests which are to be made. 





(a) T'wo-variate problems (p = 2) 
The general distribution may be used, but since the discriminant function y is specified 
by only one direction-cosine, it is adequately defined by 7. 
_ Wid, + wi, 
wh + wi 


Then 


> 


and on transforming from w,, 7, to 9,8, the conditional distribution of 7 is found to be 
dy 
m V{(4,—9) (9 — 92)} 


The joint distribution of 4,, 4, and 7 is therefore 





-q-3 (91 — 9s) d6,d0,dy 


const. (9,4) °[(1 — 01) (1 6.) a ay (On) 





and the conditional distribution of 4, and 4, is 


(9,0,)«¢-® [(1 — 4,) (1 —6,)]"-9-9) (0, —9,) d0,d0, 


const. a2 1 — 9) 4-® (8, — 9) ( — 83) 





We now make a change of variables to 





», — 1-2 (9-9) 

1 (1=9) (9 = 9,62)’ 
6,0 

to He 


The joint distribution of v, and v, is 
const. vy + (1 —v,)#"-2-9 dv, vf4-9 (1 — v,)#"-9-® dy. 


It is seen that v, and v, are independently distributed, and so may be used to test inde- 
pendent aspects of the data. Moreover, the distribution does not involve 9, so that it also 
gives the unrestricted distribution, when 7 is not fixed, for any value of the population 
parameter. 











ag 


he 


ed 


‘le- 
Iso 
ion 








E. J. WrtiaMs 23 


It is to be noted that, while v, and v, formally are symmetric functions of 6, and 6,, they 
depend on these statistics in quite different ways; v, is dependent mainly on @,, while v, 
depends mainly on @,. This is because, for any acceptable hypothesis, 7 must lie in the 
neighbourhood of @,; consequently, 


a,-9 
me et 
V_~ Ag. 


Hence v, is appropriate for testing the concordance of the data with the form of the 
proposed discriminant function while v, gives a test for departure from collinearity. The 
latter test is a rather ‘fluid’ one, in that it depends on the choice of the hypothetical dis- 
criminant function and the resulting 7. The justification for the use of v, is that it is inde- 
pendent of the unknown parameter, being dependent instead on a statistic which, while 
to some extent arbitrary, is yet determinate. Where the set of acceptable hypotheses as 
determined by v, is well defined, v, will often not differ greatly from 6,; however, since 0, is 
the minimum value of v,, we have a simple demonstration of the fact that the systematic 
use of 6, as a statistic with the distribution of v, to test departures from linearity tends to 
underestimate significance. 

The criteria have independent F distributions; 


(n—q—1)% 


is distributed as F with 1 and n—q-— 1 degrees of freedom, while 


(n—4) U2 


(q—1)(1—%,) 
is distributed as F with q—1 and n—q degrees of freedom. However, it seems best to test 
the two aspects of the data in the manner indicated in the analysis of variance set out below, 
rather than to use v, and v, as they stand. 

It may be remarked here that the discriminant function y will be sometimes rejected 
because of significant discrepancy in direction; at other times because of ‘departure from 
linearity’. For any single discriminant function to be acceptable, it must clearly give 
non-significant departures for either aspect. 

The criterion v, is not a monotonic function of 7; from a zero value at 7 = 9, it rises to 


&® maximum value | 0, — /9> \ 
1— ¥(0,9,) 


at 9 = V(4,9,). 


Hence, for a sufficiently high level of significance in any example, no chosen discriminant 
function will be judged discordant with the data; in other words, no fiducial limits for the 
true discriminant function will exist. This is to be expected, since the difference between 
any two discriminant functions corresponds, not to a distance which can be increased 
indefinitely, but to an angle. 

An overall test for the adequacy of the discriminant function y is provided by a method 
given by Bartlett (1939). The effect of the proposed discriminant function is climinated by 











24 Some exact tests in multivariate analysis 


covariance, and the discriminant ratio for the adjusted variates is tested. Since, in this 
case, y is a linear compound of 2, and 2,, the elimination of its effect from either has the 
same result. The analysis of covariance of 2,, for instance, where jz is the angle between 
y and 2p, is: 


























Sums of squares and products 
Adjusted sum of 
squares x2 
ax? mY y 
Between groups a, 4, sin jo 9 = 9, sin? 1+ 8, cos? ju (+9 a cos® ft 
Within groups 1-6, (1—9,) sin pw 1-9 ory eee cos? 4 
| 
Total 1 sin # 1 cos* 4 

















Denoting the adjusted discriminant ratio by uw, we have 


ee nd 
tne sO ah hh td 
I-9 
so that U = Vp + Vg—V, Up. 
A significant value of u may result from a large value of v, or v, or both. The ratio 
(n—g-1)% 
q(1—u) 


is distributed as F with gq and n—q— 1 degrees of freedom. 
The criteria may be set out in analysis of variance form, as follows: 














Degrees of L 
| nat Sum of squares 
| woot (A,—%) (n — 93) 
Direction 1 a NG! = y(l—v 
4(1—9) (le) 
Linearity q-1 “ls =, 
Residual n—q-1 | cae a = (1—7,) (l—) 
| | 
‘GeGieiames manncmanmee pada Saneed 
Total n—-1 1 














This shows that the criteria for direction and linearity may each be tested against an 
independent estimate of residual variation. 
When q¢ = 1,4, = 0 and the analysis of variance reduces to the familiar form. 








is 





wun 





E. J. WILLIAMS 25 
(b) Three groups (q = 2 


When there are three groups, there are two sample roots, 0, and 0, not identically zero, 
but there are also p—2 zero roots. Any hypothetical discriminant function y then lies in 
a space of p dimensions, the corresponding discriminant ratio 7 may range from zero to 0,, 
and the analysis given above for p = 2 does not apply. Here again the general distribution 
may be used, but we shall adopt a simpler approach which is of interest. 


8 > 
2 2 
and shall write f= ity 22 








wi + uy 
so that 6,<{<4,. 


As before, w,, Wg, ...,w, are, on the null hypothesis, independent unit normal deviates. 
The joint distribution of s, w, and w, is then found to be 


const. e~# (3 — w? — w3)?- dsdw,dwy. 


We now transform the variates to find the joint distribution of 9, € and s; after integration 
of this joint distribution with respect to s, the joint distribution of 9 and € is given as 


= 
a 


sas oa (,=O) (E-8)}" 


The joint distribution of 6,,6,,9 and ¢ is therefore 








. ner C- ") Ke) 0, d0,dndE 
s 4 oon a in—p-—3) = 


and the conditional distribution of 6,, 6, and ¢, given 7, is 


const. (4, 0,)#?-» 








[(1—9,) (1 —4,)]u-2 P a d6,d0,dé 
i % 


(1—y)t-9 es 4 V{(A, — ©) (€ — 94)} 


_ (4,—$) (f—62) 





We now put v= (1-0 (£-4,0,)" 
6,9 
=F, 
C-) 
v3 = ay 


The joint distribution of v,, v, and v, is found to be 
const. vy! (1 —v,)-”-9 dv, ve (1 — vg) K"—-P-® dv, vf (1 — v5) "-?-9) dv, 


so that v,, v, and v, are independently distributed. From their definition it is seen that 
v, and v, provide tests of direction and linearity respectively, attention being confined. to 
the 2,, x, plane, and in fact correspond to the v, and v, defined for the discussion of the case 
p = 2. A test of the departure of the hypothetical discriminant function from the 2,, x, 
plane is provided by v,. The tests cannot, however, be set out in the form of an analysis 











26 Some exact tests in multivariate analysis 


of variance, as was possible for p = 2. This is because the criterion for an overall test of the 
adequacy of the hypothetical discriminant function, namely, 


(1-4) (1-4) _ 
1-9 
is distributed not as a Beta-variate but as the square of a B(n—p—1,p-—1) variate. This 
is the result first discovered by Wilks (1932), and enables the ‘residual’ to be tested by 
means of the F distribution with 2(p— 1) and 2(n—p-— 1) degrees of freedom. 
We can, however, set out the tests for departures in direction in one analysis of variance, 
and the tests for departures from linearity in another. Departures in direction may lie in 
the p space, while departures from linearity are observable only in the z,, 2, plane. 


(1—2,) (1 — 0) (1 — 9s), 



































Direction —— wily Sum of squares 
= (A,—£) (C-94) _ 
In the 2, 7, plane 1 (1-7) (€—0,0,) = v,(1 — vs) 
In the p—2 spaco p-2 = = Us 
: ¢(1—94,) (1-—8,) 
Residual n—-p-l Pn le! Goltes. FY 5 Bax 
. (1=9) (-G8,)~ 2%) O— vs) 
Total n—2 1 
Linearity pro tg Sum of squares 
Direction in the plane 1 ate = v,(1—1%,) 
Linearity p-1 a =, 
Residual n—p—1l Ua 800 =) =(1—v,) (l—v,) 
Total n—1 1 

















These two analyses enable detailed tests to be made of the various departures from the 
hypothetical discriminant function. It must be recognized, however, that these tests are 
not independent. 

The limiting case when n tends to infinity gives an exact analysis of variance. Such a case 
arises in practice in the canonical analysis of a set of variates with known covariance matrix. 
If, modifying the notation slightly, we put 


nb, -S » n> S,-s 


nO, -—> S., no—> S,, 

















E. J. WILLIAMS 27 
o, a (S, ae S,) (S, aa S83) 
1= 








8, . 
8,8 
Vg = 5 , 
Tr 
V3 = S,-8,,, 


the joint distribution of v,, v, and v3 comes out as 
const, e~H*rtstts) yr tyP-9 yt?) dy, dvgdvg, 


and the analysis of variance is then 














Degrees of freedom Sum of squares 
Direction in the plane 1 B= 8) = Sy = 
— 
Direction in p—2 space p-2 S,-Sy = Us 
Linearity p-l Ss =, 
S, 
Departures from hypothetical 2(p—1) S,+S,—Sy = 0, +Vgt%s 
discriminant function 
Hypothetical discriminant function 2 Sy 
Total 2p S, +8, 

















This analysis is applied to a practical example in a forthcoming paper (Williams, 1952). 
It now remains to relate the results for g = 2 to those obtained in the analysis of the 
general case. To do this, we express v,, v2 and v, in terms of the 6; and ¢,. We have 


y= C= AMA=6,) 
(1-4) (1— 5)’ 


2 _ 81-7) (9, — $1) (A, — $2) 
; (1-4) (0; — 92) A, ; 


8(1—7) (9g— $1) (92— 2) 
(1 — 4) (92 —9,) A. ; 


wet we (1-9) [4,4,(1 — 91) (1-2) — $1 Go(1 — 94) (1 — 92)] 
é 6, 4,(1 —6,) (1 — 4) 


i 1 — Px P21 — 1) (1 — 99) 
,64(1 — 9) (1g) 


For brevity we write P = 0,4+0,—9,— 9», 
Q = 9,9,— 9,92, 
R = 8,04(p, + $e) — $1 Pal, + 92); 


j — 





Ww 





2 — 
Ws = 





so that 














28 Some exact tests in multivariate analysis 
then it can be shown that R = (0,+0,)Q-—0,0,P 


-_ ($1 + $2) Q— $4 4oP, 
PR-Q = —1(6,-4,) 


= (4, ia ?) (4, ae po) (d, ye 45) (6, 2). 

















We find that C= ass 
1 2 
P-Q 
= O00 
1-£= (1-4)(1-0) oS, 
) .<-% 
2 P-qQ’ 
tap, — Par 2@+R 
ae TOO 
ee 
 OP-2Q+R)’ 
PR-@Q? 
nll) = Gp 
_ —16,- $4) 
Q(P-Q) ’ 
-—R 
(1-04) (1-0) = SE", 
P-¢ 
0% = bitin pe 
( 
=e = (Ib) (I= dd oa, 
PR- 2 
v,(1 — vs) = (1 —A)( -¢) OE SOTR) 
P-¢ 
(12) (1-13) = (1-9) (1b) pa 


(c) Three or more variates ; more than three groups (p,q >3) 


When p and q are greater than 2, the joint distribution of the latent roots is more com- 
plicated. A method of analysis to give tests for direction and linearity is given by Bartlett 
(1951). 

For an overall test of significance, the method of Bartlett already referred to may be 
used, On the hypothesis to be tested, the product of the within-group ratios, after adjustment 











eg 








E. J. WILLIAMS 29 


by covariance with the hypothetical discriminant function, will be distributed independently 
of the parameter. The test function thus derived is 


which has already appeared in the simpler cases discussed above. When p = 2 it has 
a Beta-distribution, and when p = 3 or q = 2 (that is, when it is the product of two adjusted 
ratios) it is distributed as the square of a Beta-variate. For p = 3 the distribution is 


const. v2—1(1 — v)"-9-3 dy, 


where (1—v)? = (1—w), 
4 iat (n—q—2)v 
so that i 


has the F distribution with 2q¢ and 2(n—q—2) degrees of freedom. When q = 2 the 
distribution is const. »?-2(1 —v)"->-2dp, 
so that eee 
(p—1)(1—2) 
has the F distribution with 2(p— 1) and 2(n—p-— 1) degrees of freedom. 

For larger values of p and q the exact analytical form of the distribution is not known, 
but Bartlett (1938) has developed approximations using the x? distribution, which are 
satisfactory in most cases for making significance tests (see also Hsu, 1940). A more serious 
criticism of the criterion is that, since the same value of 7 corresponds to widely differing 
forms of the hypothetical discriminant function, it is difficult to interpret a significant result 
given by the overall test. 


4. NUMERICAL EXAMPLE 


To illustrate the use of the tests outlined above, we apply them to some data presented by 
Bartlett (19476, pp. 177-9). This is an experiment on the effects of eight treatments, each 
replicated eight times, on yield of straw (2, in Bartlett’s notation) and of grain (x,). We shall 
not question the relevance of discriminant analysis in this example, though it seems that 
for practical purposes some linear combination of 2, and x, representing total value, rather 
than one maximizing the discriminant ratio, will be the appropriate one to study. 

After eliminating block differences, we have 


n=56, p=2, q=7, 
and Bartlett gives (in our notation) 6, = 0-47698, 
6, = 0-05934, 
the discriminant function corresponding to 6, being 
X_— 05352. 


It is reasonable to inquire first of all whether 2, alone is a satisfactory discriminant 
function (in other words, whether the coefficient of x, differs significantly from zero). For 


Z_ we have = 0-31570, 








30 Some exact tests in multivariate analysis 
so that the sums of squares for the analysis of variance are as follows: 


0°16128 x 0-25636 




















Direction: 031570 x0-68430 ~ 0-19139, 
, .,.. 0°47698 x 0-05934 
. =n e 55 
Linearity: 0-31570 = 0-08965, 
: 052302 x 0-94066 
Residual: 068430 = 0:71896, 
and the analysis of variance is 
Degroes of freedom Sum of squares Mean square 
Direction 1 0-19139 0-19139 
Linearity 6 0-0£965 0-01494 
Residual 48 0-71896 0-01498 
Total 55 1-00000 




















The data are discordant with the proposed discriminant function x2, with significance at 
the 1% level, so that x, alone is unsatisfactory. 

We next inquire what set of discriminant functions is acceptable in direction, assuming 
no departures from linearity. If the tabular value of the F distribution for 1 and 48 degrees 
of freedom at the required significance level is F, we have for the limiting value of 7, 


48(, —9)(4- 9.) = 
(1 —8,) (1-43) 
At the 5% level, F = 4-0427, giving 7 = 0-42890; a. the 1% level, F = 7-1942, giving 
n = 0°39002. 
A discriminant function for which the discriminant ratio is less than either of these values 
will be judged discordant with the data, at the corresponding level of significance. 


F. 


The author is indebted to Prof. R. A. Fisher and to Prof. M. 8. Bartlett for much helpful 


advice. The work described in this paper was carried out as part of the research programme 
of the Section of Mathematical Statistics, C.S.I.R.0. 


REFERENCES 
Bart ett, M. 8. (1938). Further aspects of the theory of multiple regression. Proc. Camb. Phil. Soc. 
34, 33. 


Bart LETT, M. 8. (1939). A note on tests of significance in multivariate analysis. Proc. Camb. Phil. Soc. 
35, 180. 

BaRTLETT, M. S. (1947a). The general canonical correlation distribution. Ann. Math. Statist. 18, 1. 

Barrett, M. S. (19476). Multivariate analysis. J.R. Statist. Soc. B, 9, 176. 

BartTtett, M. 8. (1951). The goodness of fit of a single hypothetical discriminant function in the case 
of several groups. Ann. Hugen., Lond., 16, 199. 

Fisuer, R. A. (1934). Two new properties of mathematical likelihood. Proc. Roy. Soc. A, 139, 343. 





Ss 


al 
1e 


3. 





E. J. WILLIAMS 31 


FisHeEr, R. A. (1939). The sampling distribution of some statistics obtained from non-linear equations. 
Ann. Eugen., Lond., 9, 238. 

FisHEr, R. A. (1949). The Design of Experiments, 5th ed. Edinburgh: Oliver and Boyd. 

Hore ..inG, H. (1936). Relations between two sets of variates. Biometrika, 28, 321. 

Hsu, P. L. (1939). On the distribution of roots of certain determinantal equations. Ann. Eugen., 
Lond., 9, 250. 

Hsu, P. L. (1940). On the limiting distribution of the canonical correlations. Biometrika, 32, 38. 

Nanpa, N. (1948). Limiting distribution of a root of a determinantal equation. Ann. Math. Statist. 
19, 340. 

Pirman, E. J. G. (1939). The estimation of location and scale parameters of a continuous population 
of any given form. Biometrika, 30, 391. 

Pirman, E. J. G. (1939 6). Tests of hypotheses concerning location and scale parameters. Biometrika, 
31, 200. 

Roy, 8S. N. (1939). »-Statistics or some generalizations in analysis of variance appropriate to multi- 
variate problems. Sankhya, 4, 381. 

Roy, 8. N. (1942a). The sampling distribution of p-statistics and certain allied statistics on the non- 
null hypothesis. Sankhya, 6, 15. 

Roy, 8. N. (19426). Analysis of variance for multivariate normal populations. Sankhya, 6, 35. 

Roy, 8. N. (1946). Multivariate analysis of variance: the sampling distribution of the numerically 
largest of the p-statistics on the non-null hypothesis. Sankhyd, 8, 15. 

Wks, 8. 8. (1932). Certain generalizations in the analysis of variance. Biometrika, 24, 471. 

Witu1ams, E. J. (1952). The interpretation of interactions in factorial experiments. Biometrika 39, 65. 











[ 32 ] 


THE CONSTRUCTION OF BALANCED DESIGNS FOR EXPERIMENTS 
INVOLVING SEQUENCES OF TREATMENTS 


By H. D. PATTERSON 
Rothamsted Experimental Station 


1. INTRODUCTION 


This paper is concerned with the type of experimental design in which treatments are 
applied to the experimental material in a number of successive periods, each experimental 
unit receiving a different treatment in each period. Although some of the simpler examples 
of these designs have been in practical use in at least one field of research for several years, 
a systematic discussion of their general properties and the enumeration of possible designs 
does not appear to have been previously attempted. 

In order that estimates of the effects of treatments and the errors can be estimated by 
a reasonably simple statistical analysis certain elements of balance, set out below, are 
required. The requirements depend largely upon which residual effects are to be estimated. 
For example, any Latin square arrangement, in which the rows represent the different 
periods so that a column of symbols refers to a sequence of treatments, is suitable for the 
case in which residual effects are negligible. 

When first residual effects (i.e. the effects of treatments in the period after application) 
are to be estimated the class of available designs is more restricted. A simple example has 
been described by Cochran, Autrey & Cannon (1941) in connexion with feeding experiments 
on dairy cows. This arrangement, shown in Fig. 1, consists of two particular Latin squares. 








Periods Sequences 
I 1 2 3 1 2 3 
II 2 3 1 3 1 2 
Ill 3 1 2 2 3 1 





























Fig. 1. Change-over design for three treatments. 


Arrangements of this sort are of limited value, as v, the number of treatments, is restricted 
to be equal to k, the number of periods. Designs for v > k contain incomplete sequences and 
are related to the balanced incomplete block designs introduced by Yates (1936). For 
example, any Youden square can be used if residual effects are negligible. 

Even if there are residual effects it is sometimes possible to use the less restricted designs, 
introducing intermediate periods with a uniform non-experimental treatment between the 
experimental periods. Estimates of residual effects are obtained from the measurements in 
the intermediate periods. Often, however, this type of arrangement cannot be used. It will 
not be considered further in this paper. 

The application of change-over designs to animal feeding experiments has been surveyed 
by the present author (Patterson, 1951), but without particular reference to problems of 
construction. Reference may be made to this paper on questions of analysis. 














a) 


A 
e: 
Pp 


fe 


s 
V 
s 
C 
t 
- 
i 
] 
t 
s 
é 
( 
é 


rs 








H. D. PatTERSON 33 


2. GENERAL PLAN OF THE PAPER 


In §§3 and 4 the conditions of balance are set out for the case in which first residual effects 
are estimated. It is shown how these conditions are related to the equations of estimation. 
A necessary but not sufficient relationship between numbers of treatments, periods and 
experimental units provides the basis of a table of possible designs for not more than six 
periods or sixty experimental units. 

The remaining sections are devoted to the methods of construction. Many designs exist 
for which it is sufficient to define a number of leading sequences, the remaining sequences 
being obtained by a simple process. The construction of these designs is first discussed in 
general terms. Application is then made to: 

(1) designs for any v equal to k, the number of periods, requiring v or 2v units depending 
on whether v is even or odd (§6); 

(2) designs for v equal toa prime or power of a prime, any k not greater than v, and requiring 
v(v—1) units (§7); 

(3) designs for va prime of the form 4n + 3, k equal to 3, and requiring $|v(v— 1)] units ($8); 

(4) designs for v a prime of the form 4n + 3, k equal to $(v + 1) and requiring 2v units (§ 11); 

(5) designs for v a prime of the form 4n +3, odd k, and requiring }[v(v— 1)] units ($13). 

A number of other designs are also obtained, and two of the listed three period designs 
are found to be non-existent. 

Reference will be made several times in the course of this paper to various mathematical 
systems, and in particular to finite groups and finite fields. For convenience these systems, 
which contain finite numbers of elements, are briefly discussed here. In the case of groups 
some rule of combination, such as addition or multiplication, is defined. The combination 
of any two elements of the group, having regard to order, uniquely defines an element of 
the group. The group contains an identity which, when combined with any element of the 
group, leaves that element unaltered. To every element there corresponds an inverse, also 
in the group, such that the combination of the element and its inverse yields the identity. 
Finally, the associative law holds with respect to the rule of combination. If, in addition, 
the rule is commutative, the group is Abelian. The identities of additive and multiplicative 
groups can conveniently be represented by 0 and | respectively, and the inverses of an 
element a by —a and a-!. A multiplicative Abelian group which, together with an element 
0 forms an additive Abelian group such that the distributive laws are satisfied, is called 
a finite field. It is known that finite fields must contain P’ elements, where P is any prime 
and N any positive integer. 


3. BALANCED DESIGNS FOR CASES WHEN RESIDUAL EFFECTS ARE NEGLIGIBLE 


Whilst we are not primarily concerned with experiments in which residual effects are 
negligible, it is convenient to consider these first. A large number of arrangements is already 
available. These include Latin squares and Youden squares. The latter are balanced 
incomplete block arrangements, with the additional restriction that each treatment occurs 
just once in each position of the blocks. Extensions are also available in which each treatment 
occurs two, three, ... times in each position. 
The conditions for balance can be stated as follows: 
I. No treatment symbol occurs in a given sequence more than once. 
II. Each symbol occurs in a given period an equal number of times. 
Biometrika 39 3 











34 Balanced designs for experiments involving sequences of treatments 


III. Every two treatment symbols occur together in the same number of sequences. This 
number, in the usual notation, is A. 

A design consists of b sequences, and it is obvious that b/v must be integral. We have the 
restriction on 6/v, implied by condition ITI, that 

PRE 1) 0 (mod v—1). (1) 

A design does not necessarily exist for the minimum integral value of b/v satisfying (1). 

Many of the problems involved in the construction of balanced incomplete block designs 
and the Youden-square type of design have been solved. A list of combinatorial solutions 
has been provided by Fisher & Yates (1948), and Bose (1939) has discussed a number of 
methods of construction. 


4. BALANCED DESIGNS WHEN THERE ARE RESIDUAL EFFECTS OF TREATMENTS 


The effects due to a treatment in periods after that of application are described as residual 
effects. In this paper we primarily consider the case in which: 

(1) First residual effects only need be considered, i.e. the effect of the treatment on the 
measurement in the period two after the period of application can be neglected. 

(2) Residual effects are independent of the treatment applied in the period in which they 
are observed. In this case direct and residual effects are said to be additive. For example, in 
the sequences 1, 2, 3; 2, 1, 3, the measurement of the first residual effect of treatment 1 is 
made in the second period of the first sequence and the third period of the second sequence. 
The treatments applied in these periods are treatments 2 and 3, but the residual effect is 
the same in both cases. If d is the direct effect of a treatment, and r the first residual effect, 
then the measurements in the two sequences are on d,,d,+7,,d3 +12; da,d,+12,d3+7}. 

Denote by 

%,, the mean for treatment 1 in periods of application, 

8, the mean for sequences in which treatment 1 is applied, 

%,, the mean in periods one after the application of treatment 1, 

8,, the mean for sequences in which treatment 1 is applied in one of the first kK—1 
periods where k is the number of periods. 

If a design satisfies the conditions I-III and in addition conditions IV-VII set out below, 
estimates of differences are obtained in the form: 

A(d, —d,) + B(r,—12) = (Za, — Za2) — (Bar — Suz); (2) 
B(d, —d,) + C(r,— 12) = (2 —%2) — (81 — 5,2), (3) 
where A, B, C depend on the design. 

Conditions IV-VII are: 

IV. Each ordered succession of two treatment symbols should occur equally often in 
sequences. For example in Fig. | treatment 1 immediately follows each of treatments 2 and 
3 at some stage. 

V. Every two treatment symbols occur together in the same number of curtailed 
sequences formed by omitting the final period. 

VI. In those sequences in which a given treatment occurs in the final period the other 
treatments occur equally often. 

VII. In those sequences in which a given treatment occurs in any but the final period 
each other treatment occurs equally often in the final period. 





a 


a ee ee el 


Ce ee 





1a) 





LL ee) SRE Sg ee re ee 


eg rr rrr gem 





H. D. Patrerson 35 


Condition V is automatically satisfied with conditions III, VI and VII, but it should be 
noted that conditions I—-V are not sufficient. 


The following argument supports the statement that these conditions lead to equations 
of the type (2) and (3). Restrict the direct and residual effects so that 
d,+d,+...d, = 0, 
Mrt+ret+...%, = 0. 
The conditions III-VII ensure that the only direct and residual effects involved in 


Zay, Zp) 8g, and 3,, are d, and r,. The following diagram shows which conditions are effective 
in this respect: 











Effects involved 
d, ry 
Fay ‘ IV 
En IV ; 
Sa Ill Ill, VII 
8 V, VII Vv 

















Thus condition IV ensures that %,, is free of all residual effects but r,, and Z,, is free of all 
direct effects but d,. 


The restriction on 6/v follows from condition IV and is that 
B=) = 0 (mod v1). (4) 


Designs have been sought for & = 3, 4, 5, 6 requiring 60 or fewer units. Minimum values of 
6/v obtained from (4) are given in Table 1, but it does not necessarily follow that designs 
are available for this minimum value. 


Table 1. Designs balanced for first residual effects 











k=3 k=4 k=5 k=6 
v 
we Section “ole Section ey Section ble Section 

3 2 6,7 — — —_ — —_ —- 

4 — _ 1 6 — —_ — o 

5 2 + 9 4 7 2 6 a —- 

6 5 * — —_— 5 * 1 6 

7 3 5,8 2 ll 3 13 6 7 

8 7 7 7 7 7 7 7 7 

9 4 +9 — — 2 - —- —_ 
10 — _ 3 * = — —_ ee 
1l 5 8 — — 5 13 2 ll 
13 — — 4 10 3 12 — _- 
16 — — —_ — —_ — 3 . 



































k, no. of periods. v, no. of treatments. 06, no. of units. 
* No design found. + Design non-existent. 


3°2 











36 Balanced designs for experiments involving sequences of treatments 


Designs are also available for 








v k b/v Section 
6 3 10 10 
10 4 6 10 




















In addition to the designs of Tabie 1, a number outside the range have been obtained and 
are mentioned in the appropriate sections. 


5. METHOD OF CONSTRUCTION USING DIFFERENCES 


Consider a design for seven treatments and three periods such as that based on three Latin 
rectangles and shown in Fig. 2. The treatments are represented by the non-negative residues 
(mod 7). Reference to the conditions I-VII shows that the design is balanced. The structure 
is of considerable interest, each sequence of a given rectangle being obtained from the 
previous sequence by replacing symbols in the order 0 1 234560. A solution of this sort is 
described as a cyclic solution. It should be noted that for any pair of rows in a given 
rectangle, the difference between treatment symbols in every sequence is the same. 








Tr 1 
Periods Sequences 
I 0123 4 5 6 012 3 4 5 6 O.1 .3 3.4.5.4 
II oS 843.3 4 12 2S er 3 so fF 2S 4 SD 
III "0 2 Se 4 3 56 OU hus 3 € B 4 6S. @. 1:8 
Ist rectangle 2nd rectangle 3rd rectangle 

















Fig. 2. Three period change-over design for seven treatments, based on Latin rectangles. 


In this section the general problem of obtaining solutions for which it is sufficient to list 
b/v initial sequences, or sets of differences, is discussed. Consider a design consisting of 
b/v Latin rectangles each having k rows and v columns. Represent each treatment by one 
of the elements of an additive Abelian group of order v. One treatment may therefore be 
represented by 0. The rectangles are of special type such that the differences between 
successive rows of the ith rectangle are 4,;,6);,...,d,-1,;, 1 = 1,2,...,b/v. The sets of 
differences, one set to each rectangle, must be such that 

(1) No two treatments of the leading sequence in the ith rectangle 

O, By:, Oy + bg4, ---, Og + bgp +... +81 t= 1,2,..., 6/0 
are the same. With the method of construction of the remaining sequences given below this 
condition ensures that Latin rectangles are obtained. 

(2) Each difference is some non-zero element of the additive Abelian group of treatment 
symbols. The d’s do not necessarily define the group apart from the identity; thus some of 
the d’s may be the same. As yet we do not even say that all the non-zero elements of the 
group are represented in the é’s, although it will be shown later that this is a necessary 
condition for balance. The remaining sequences of a rectangle are obtained by adding the 
non-zero elements of the group of treatment symbols in turn to the symbols of the leading 
sequence, keeping the same order. 








~ 6: Ua eo ae C=“g fe ees 


nd 








ee 








H. D. ParrEerson 37 


It will be convenient in the following discussion to consider the sequences which include 0. 
These sequences are given by the k columns of the array shown below and are obtained 
by adding the k elements 0, —6,,;, — (6; +69;), ..., —(6,; +6; +... +6,_,,;) in turn to the 
elements of the leading sequence. Ar example of such an array is given by the first, third 
and fifth columns of the second rectangle of Fig. 2. 


0 —3,; — (8,;+ 43;) eee — (8,,+ 464; + coe + d,_;,:) 
by; 0 — 53; vee — (8g; +... +84-1,4) 
61; +59; 6; 0 eee — (63;+ eee +6,_1,:) 
81, +9; + 93; 89; + 63; 53; see — (84, +... +O, -1,) 
Oyg t+ Sot ---+Opi¢ Sop tOgit--- + Op 35 Og¢t--- +O, g  -- 0 


A number of conditions must be imposed on the choice of the 6 so that the conditions for 
balance are satisfied. Conditions I and II are obviously satisfied as Latin rectangles are used. 
Denote by 6,, for example, all the elements 6,;,¢ = 1, ...,6/v, with the convention that any 
operation on the 6, is to be interpreted as carried out on each 6,,;. Thus the set 6, + d, is to be 
interpreted as the b/v elements 6,;+ 6,; taken over all i, and not, as is usual in the notation 
of sets, the set of elements belonging to at least one of the sets 6, and 6,. Similarly, denote 
the differences 4,;, 55;, ..., 3,1,; by «;, and the set of all the a; by «. Also let the set of all the 
elements to the left of the leading diagonal of the above diagram be denoted by f;, and the 
set of all the £; by £. The set of cumulative sums starting with 6,_, ;, ie. 


(Ba, is Seo, + Opt, is «++ Og + Ogi +» + O14) 


is denoted by y,, and the set of y; by y. Condition ITT is satisfied only if + # and — # together 
include all the non-zero elements of the group an equal number of times. In this case all 
pairs of treatment symbols, one of which is 0, occur in an equal number, A, of sequences. 
Condition IV is satisfied only if all the treatments immediately following 0 in sequences, 
i.e. the elements in «, include each non-zero element of the group an equal number of times. 
Similarly, all the sequences in which 0 occurs in the final period contain as the other entries 
the elements in —y and the sequences in which 0 occurs in any but the final period contain 
as entries in the final period the elements in y. It follows that Conditions VI and VIT. are 
satisfied only if the elements in y include each non-zero element of the group an equal 
number of times. All the new conditions @re sufficient, as sequences involving an element 
other than 0 are obtained by adding this element to each of the entries including 0 in the 
above diagram of sequences. Thus the treatments immediately following w, where w is an 
element of the group, are given by adding w to each element in a, and can be represented by 
a+w. If « includes each element of the group equally often then so must «+ w. 

It is convenient in testing the balance of a proposed arrangement to write down b/v arrays 
of symbols, one for each Latin rectangle: 


bi 


01: +95; 
89; 


+e. 64;+ Osi +53; 
3i . Oy, + 6g¢ +--+ O14 


Bp-2,2+ 9 p11 
8-1, 1 











38 Balanced designs for experiments involving sequences of treatments 


Such an array is described here as a triangular diagram. It is then possible to identify 
a, 8, y without much difficulty. 

Examples of finite additive Abelian groups are provided by the systems of non-negative 
residues (mod wv), where v is any positive integer. For the design of Fig. 2 in which v = 7 the 
triangular diagrams are: 


5 3 6 
6 5 3 
1 2 4 


Ist rectangle 2nd rectangle 3rd rectangle 


a, y are easily identified and each seen to include all the non-zero residues just once. The 
elements in £ are (5, 1, 6, 3, 2,5, 6,4, 3) and the elements in — f are therefore (2, 6, 1, 4, 5, 2, 
1,3, 4), so that + and — # together include each non-zero residue 3 times. 

Whilst, for the sake of simplicity, finite additive Abelian groups have been used in the 
above discussion it is possible to consider cther mathematical systems (not necessarily 
satisfying all the postulates of groups). To take an obvious example a multiplicative group 
could be used, in which case multiplication of the treatment symbols in the leading sequences 
by elements of the group other than the identity would yield the remaining sequences. 


6. DESIGNS BASED ON COMPLETE LATIN SQUARES 
Some designs can be obtained from complete Latin squares. The number of treatments is 
then restricted to be the same as the number of periods. Represent the treatments by the 
non-negative residues (mod v), where v is any positive integer. Williams (1949) has shown 
that for even v = k when the minimum value of b/v is 1 the differences 
1,v—2,3,v—4,...,2,v-—1 
provide a cyclic solution. For odd v = k when the minimum value of b/v is 2, the two 
squares are defined by the differences 
1,v—2,3,v—4,5,...,v—2, 1, 
v—1,2,v—3,4,0 —5,...,2,v--1. 
These are fairly straightforward cases. Inspection shows that the differences, a, for the 
above designs include each of the non-zero residues (mod v) equally often. Latin squares 
and Latin squares with one missing row are known to be balanced in the sense demanded 


by conditions III, V and VI. In any case the contents of # and — f# together and y can be 
checked as in the example of the previous section. 


7. DESIGNS BASED ON COMPLETE SETS OF ORTHOGONAL LATIN SQUARES 
In this section it is shown that designs can be constructed for any v which is a prime or 
a power of a prime, any k < v and a value of b/v, which is not necessarily the minimum value, 
given by v—1. An example of such a design is given in Fig. 3. 








Periods Sequences 

I 123 4 123 4 123 4 
II 214 3 341 2 43 2 1 
Iil 341 2 43 2 1 214 3 




















Fig. 3. Change-over design for four treatments and three periods. 








ive 
she 


swO 


the 
1res 
ded 
1 be 


e or 
lue, 





eee 





H. D. Patterson 39 


This design which, on inspection, is seen to be balanced was obtained from the first three 
rows of the three orthogonal 4 x 4 Latin squares listed by Fisher & Yates (1948). 

Consider a set of v— 1 orthogonal v x v Latin squares obtained by a field method and such 
that any square of the set is obtained from any other square of the same set by a permutation 
of rows. Such a set exists for all values of » which are primes or powers of primes. The 
balanced designs are obtained by taking k corresponding rows from each of the v— 1 squares, 
the columns of symbols then defining the treatment sequences. 

The proof depends on the construction of the orthogonal squares as described by Stevens 
(1939). Consider a field of v symbols: 0, 1, ue, ..., u,_,, Where v = P, Pisa prime and N any 
positive integer. A Latin square of symbols is written as 


u{u,+u,} (u,+9), (5) 


where u, defines the row, u, the column and w,, u, take the values 0,1, ug, ...,U,-;. For 
a given square u, is constant, but if it is allowed to take the values 1, wo, ..., u,,_, in different 
squares a set of orthogonal squares is defined. Consider any pair of rows u,, u, (u,+U,). 
The differences between the symbols of these rows in the v— 1 squares are 


U,(Uz— Uy), U= 1, Ug, +++) Uy (6) 


and these, by definition of the field, must include each non-zero element of the field just 
once. It follows that if we take any number of corresponding rows the symbols in a given 
position of the triangular diagrams taken over all i (i.e. for all the squares) include each 
non-zero element of the field just once. Thus, for example, there is a one-to-one corre- 
spondence between the elements in (4,, + 521, dy + bg9, 843 + 89, «++ Oy, »-1 + 5o,y-1) and the 
elements in (1, ,...,%,_,). The conditions I-VII for a balanced design are therefore 
satisfied. It can be seen that this design is also balanced for any residual effects (second, 
third, ...,4— 1th) which can be estimated. 

It is of some interest to note that more general abstract algebras can be used. The algebra 
of finite fields is associative and commutative with respect to addition and multiplication 
and satisfies both distributive laws, e.g. 


U,(U, re Uy) = UU, —UjUy, 
(u, “ Uy) Un = UzUy — Uy Uy. 


It appears that of the laws just mentioned it is sufficient to postulate one of the distributive 
laws. Orthogonal sets of Latin squares and change-over designs can then be obtained as 
before from (5). 

It is important to recognize that the above proof does not extend to all sets of orthogonal 
Latin squares. For example, Bose & Nair (1941) have given a set of squares associated with 
a non-Desarguesian geometry. This set does not have the property by which each square 
may be obtained from any other by a permutation of rows. Not all the arrangements 
obtained by taking k corresponding rows from each of the v—1 squares of the set are 
balanced. 

In the case of a design based on all the v!/(v—k)! possible sequences represent the treat- 
ments by the group of least positive residues (mod v). The differences between any two rows 
must include each of the non-zero residues equally often so that this design is also balanced 
for up to k— 1th residual effects. These arrangements have, however, the disadvantage that 
large numbers of experimental units are required. 











40 Balanced designs for experiments involving sequences of treatments 


8. THREE-PERIOD DESIGNS 


The results of previous sections will now be used in finding a number of three-period designs. 
A general series is available for any number of treatments which is a prime or power of 
a prime of the form 4n + 3, where n is a positive integer, with b/v taking the minimum value 
of $(v—1). A number of designs for other values of v can also be obtained. 

Consider a design with differences in the ith rectangle given by (4,,, d2;) (¢ = 1, 2, ..., 6/v). 
Let the treatments be elements of the group of non-negative residues (mod v). As «, the set 
consisting of 6, and 6, and y, the set of 6, and 6, + 6, must each include all non-zero residues 
equally often, there is a one-to-one correspondence between the elements in 6, and the 
elements in 6,+6,. Further, for any odd v the sum of all the elements in 6, + 6, must be zero, 
so that the sums of the elements in 6, and 6, must each be zero. For odd v and a minimum 
value of b/v, i.e. }(v—1), the 6,,6, should be chosen so that there is a one-to-one corre- 
spondence between — 6, and 6,. The following sets of elements illustrate the above remarks 
in the case v = 7, b/v = 3. 

eee 
~~ en 
6, +6, 6 5 3 
This design was given in Fig. 2. 

Designs have been obtained in this way for v = 7, 11, 13, 19, 21, and there is more than 
one solution for each design. Those for v = 13 are given by the pairs of differences (with 
alternatives for the elements of 6,): 


qe re we * (2) 6, 12 210 4 5 6 
6 8 411 71012 L xr rr ors 
or 4 7121011 8 oe "3 ¢-} 279 

es 8 2s 63 (4) & 4 7 8101112 
6 10 5 412 6 2 rr ee Se Se OR 
or 210 6 512 4 a. hi eee SS 


A class of designs can be obtained for certain odd v equal to a prime or power of a prime 
by a method involving the use of the properties of finite fields. This is the general series 
previously mentioned. Each treatment is represented by an element of a field of v elements. 
The elements of the field can be represented by 


0, 1,2, 2%, ...,2°-%, (7) 
where z is a primitive root of the equation 2°-! = 1, i.e. x is such that none of z, x*, ...,2°-? 


is equal to 1. We have to divide the non-zero elements into 4(v—1) pairs defining the 
differences between rows in each rectangle. Consider the pairs 


OF xe a2 of... a-*,) 
(8) 

05 PP ees at-2,| 
Each non-zero element is represented once so that condition IV is satisfied. y is represented 
by.a!, x3, x5, ...,2%a+1),22%(a+1),.... Now 2+1 must be either an even or odd power of z. 


If it is an even power y includes all the non-zero elements of the field. If it is an odd power 
we must use the pairs in the reverse order 


6; rf #F 1. 
(9) 


b5 PP FF on 


’ 








an 
ith 


(9) 








H. D. Patterson 41 


when y is x®, 2?, v4, ...,2°(a+ 1), x?(a+ 1), ..., and includes all the non-zero elements. In any 
case conditions VI and VII can be satisfied by taking one or other of (8) and (9). 

In order that conditions III and hence V may be satisfied + and —/ together must 
include all the non-zero elements three times. If +y and —-y together include these twice 


the elements 
x, 27,..., —2°, —2?,... 


must include each non-zero element just once. As 


gio-) = —], 
a property which follows from the fact that 
(xe) — 1) (xtt-D + 1) =0, 


this will, with one exception, obviously be the case if $(v— 1) is odd. The exception arises 

in the case v = 3 when x%(1 +2) is ©. The above method gives a design for any odd v which is 

a prime or a power of a prime, k = 3 and odd b/v = }(v— 1), i.e. v is of the form 4n + 3 where 

n is a positive integer. In general, solutions are found to be available from the differences 
in some order: bre ao aes 1a “ad 

(10) 

6, or 34, ere ee ke 

where 7 is odd and not equal to 4(v— 1), and the powers are reduced where necessary, using 


the relationship gel = }. 


For v = PX and N = 1 we obtain a cyclic solution, for N = 2 a dicyclic solution and in 
general an N-cyclic solution. 

The following cases are considered here: 

(1) v= 7. A primitive element is 3 so that the power cycle 3°, 3', 32, ... (reduced mod 7) 
consists of the integers 1, 3, 2,6,4,5. 2+ 1 is 4, i.e. 34, and as 3(v— 1) is odd conditions IIT 
and V are satisfied. From the pairs (8) we find the design given by the differences 

dein Ot 
b5 3 6 56 
which is the alternative solution to that given previously. 

(2) v= 11. A primitive element is 2, x + 1 is 3 and therefore equal to 28, and 3(v— 1) isodd. 
Hence from (8) the differences are 

}* 4°08 6% 
6 2 810 7 6 

(3) v = 13. Here $(v—1) is even and the method fails, but solutions have been given 
earlier. 

(4) v = 33. This design, of little practical value, is of some theoretical interest, as we 
have not so far considered an example of a power of a prime. The field can be represented 
by the elements (7) together with the relationship 2® = x + 2, and the reduction of coefficients 
mod 3. Thus 24 is x(2 +2), ie. 22+ 22, and may conveniently be denoted by 120. We find 
x+1 = 2x so must use (9) rather than (8). The solution is tricyclic and may be represented: 


6, 010, 012, 212, 122, O11, 112, 002, 200, 210, 222, 101, 220, 201 
6, O01, 100, 120, 111, 202, 110, 102, 020, 021, 121, 211, 022, 221 











42 Balanced designs for experiments involving sequences of treatments 


These pairs of differences give the leading sequences, the remainder being obtained by adding 
each non-zero element of the field, here represented by the non-zero residues (mod, 3, 3, 3) 
in turn. 


9. NON-EXISTENCE OF TWO THREE-PERIOD DESIGNS 


There are still a number of important gaps in the table of three-period designs not covered 
by the previous section. It has been found that designs do not exist for v = 5, v = 9 with 
values of b/v which are minima. 

Consider any three-period design with odd v and a minimum value of b/v. In the first 
place there are just }v(v— 1) sequences, so that the only possible way in which condition V 
can be satisfied is to have each possible pair of symbols just once in the curtailed sequences of 
the first two periods. Thus if 12. is one sequence then 21. cannot be another. (The symbol . 
here refers to any treatment, and not necessarily to the same treatment each time it is 
used.) It follows from condition IV that .21 is included and .12 excluded. In this case 
a further sequence is 1.2 as 2.1 is not allowable (condition VI). 

Now consider the design v = 5, b/v = 2. Number the treatments from 1 to 5. Without 
any loss of generality we may write, referring to condition VI, one pair of sequences as 


1 3 
2 4 
5 5 


These are the only sequences in which 5 may occur in the final period, and the remaining 
treatments must include each other symbol just once. Now 5 has immediately followed 
both 2 and 4, so that another pair of sequences must be 


Further, ! must come immediately before each of 2, 5, 3, 4 so that the sequences 


(v) (w) 
‘4 
3 4 


are required where (v), (w) are 2 and 5 in some order. But if (v) is either 2 or 5 condition V 
cannot be satisfied as the sequences 12. and 15. have already occurred. It follows that 
the design is non-existent. The design for the case v = 9, k = 3, b/v = 4 is also non-existent. 
This can be demonstrated in much the same way, but the proof is lengthy and is omitted. 


10. A SECOND METHOD OF CONSTRUCTION 


No design has been found for v = 6, k = 3,b/v = 5, but one can be obtained for b/v = 10, 
by a method described by the author elsewhere (Patterson, 1951). This design is based on 
the balanced incomplete block design for six varieties in ten blocks of three. Two orthogonal 
3 x 3 Latin squares, themselves forming a balanced design, are constructed for the treatment 
symbols of each block. Thus two of the incomplete blocks might be written (1, 2,3) and 








ng 
3) 


ng 


V 


at 








serene 





H. D. PatrrEerson 43 


(2, 4,5), in which case twelve of the sequences of the final design would be given by the 
columns of 


ris @ iss 2 4 5 2 4 5 
2 @ & 3 1 2 4 5 2 5 2 4 
3 1 2 23 1 5 2 4 4 5 2 


Such designs, which usually, however, do not have minimum values of b/v, can be con- 
structed where any balanced incomplete block design exists, superimposing on each block 
either one or two Latin squares of the type described in §6, depending on whether k is 
even or odd. 

This method may be used in the two cases v = 7 and v = 13 with k = 4,as4 x7 and 4 x13 
Youden squares exist. For these designs b/v = 4 and this is a minimum value in the second 
case. When v = 13 the solution may be represented in terms of four sets of three differences 
in the additive group of non-negative residues (mod 13). An example is 


& ..1 8m 3 
é& 24 911 
é& 6 3 512 


As a balanced incomplete block design exists for 10 varieties in 15 blocks of 4 a design 
balanced for first residual effects can be obtained for b/v = 6. The design cannot be arranged 
as 6 separate 4 x10 rectangles but can be divided into 4 x 4 squares. This point will be 
raised later. 


11. A SERIES OF DESIGNS FOR k = }(v+1), b/v = 2 


In the previous section a design was given for v = 7, k = 4, b/v = 4. The minimum value of 
b/v is, however, minb/v = 2. Consider the set of differences 1, 2, 3 and the set given by the 
inverses, i.e. 6, 5, 4, in the additive group of non-negative residues (mod 7). All the conditions 
for balance are satisfied, each non-zero residue being included in a just once, in # and —£ 
together just four times, and in y just once. The design is given in Fig. 4. 








Periods Sequences 
I 0123 4 5 6 012 3 4 5 6 
II 123 4 5 6 O 6012 3 4 5 
III 3 45 601 2 4560412 83 
IV 6012 3 4 5 123 45 6 0 














Fig. 4. Four period change-over design for seven treatments, requiring fourteen units. 


It is of interest to note that similar properties hold for some other prime v. For any such 
prime there is more than one solution. In faci a balanced design with k = $(v+1), b/v = 2 
exists for any prime v equal to 4n + 3 where nis a positive integer. Treatments are represented 
by the elements of the group of non-negative residues (mod v). The sets of differences are 


{2s, 48, 6s,...,(v—1)s} and {—2s, - 4s, —6s8,..., —(v—1)s}, mod (v), (11) 


where s is a positive integer not a multiple of v. 











44 Balanced designs for experiments involving sequences of treatments 


It is obvious that a includes each non-zero residue just once. The set £ of all elements in 
the two triangular diagrams can be separated into f’ and —f’ as the elements of one 
diagram consist of the inverses of the corresponding elements of the other. The elements 
in f’ can be written as the triangular diagram: 


28 
68 
48 128 
10s : 
6s 3(v—2)s . 
2(v—2)s 
(v—1)s 


There are simple relationships between the elements in £’. It can be shown that 
(¢+j—1)+(¢+j-3)+...4+(i-—j + 1) =7j for i>). 
If, therefore, we take values of i, j such that i +j is odd, <v andi >, each element of f’ can 


be defined. We are led to consider the matrix A equal to (a;;), where a,; = ijs (mod v) 
(i,j = 1,2,...,v—1), and v is an odd prime. Thus 


8 2s .. (v—I1)8 
28 48 .. 2v—l1)s 

A= 38 68 ... 3(v—1)s | (mod>). 
(v—l)s 2v-—l) ... (v—1)*s 


Certain properties of this matrix are noted. As a,;=a,, it is obviously symmetrical about 


the principal diagonal. Further ij=i'j’ (modo), 


where v=v-t, j' =v-j, 
so that A is symmetrical about the secondary diagonal (i.e. a;; = ayy). As 
ij=-j (mod), 
ij = —ij’ aio 
we have a;;= —ay;=—a,yy (mod»). 
Each of i and j include all the non-zero residues just once and therefore so does each row and 
column of A. 
Now construct the matrix B equal to (b;;), where 
b;;=a,; (i+jodd) 
=0 (i+jeven). 
Using (12) it follows that the elements of any row or column of B and their inverses include 


all the non-zero residues just once. The leading diagonal of B consists of 0,0, 0, ..., so that 
if we define the matrix C equal to (c,;), where 


C;,=5;; (i>J) 
=0 (i<j), 


we find that +C and —C together include each non-zero residue just }(v—1) times. 


PO: TNT — TIT Se pO 


a 








in 
v) 


ut 


2) 


id 


at 











H. D. Patterson 45 


Some properties of quadratic residues R, and quadratic non-residues N, are now required. 
(1) A number R& is a quadratic residue of a prime v if there exists some r such that 


r2=R (modv). 


The least positive residues of the squares 


are different and yield all the quadratic residues. 
(2) Euler’s criteria for quadratic residues and non-residues are 


R-YD=1 (mod), 


Nie-Y=—1 (modv). 


It follows that if v is of the form 4n+3 then — R is a quadratic non-residue. 

(3) The product of two quadratic non-residues is a quadratic residue. 

(4) The product of a quadratic residue and a quadratic non-residue is a quadratic 
non-residue. 

The above results can be proved quite easily if it is realized that the quadratic residues 
consist of the even powers of the primitive elements. 

For the remainder of the discussion we take v in the form 4n + 3. Consider the non-zero 
elements of the secondary diagonal of C. These can be shown as 


»— 1)? 
y= (- 12s, — 27s, — 3s, ..., — f 5 | s). 
Using the above properties we find that y’ includes all the quadratic residues or quadratic 
non-residues of v according to whether s is a quadratic non-residue or quadratic residue. 
From property (2) —y’ includes the remaining non-zero residues. 
Finally, define the matrix D, equal to (d,;), such that 








=0 (i+j>v). 

Thus 0 0 ye a 
2s 0 0 
0 6s 0 
4, 0 12 

D= : . (mod v) 

0 10s 0 
: 2(v—2)s : 

(v—l)s 0 pee 


The secondary diagonal is the same as that of C, and as C is symmetrical about this diagonal 
D and —D together include each non-zero residue n+ 1 times. The non-zero elements in 
D are the elements in f’. It has been shown that « and similarly y’ and —+y’ together, 
i.e. y, include each non-zero residue just once. The differences (11) therefore give a balanced 
design. Different solutions are given by taking s = 1, 2, ..., }(v—1). 











46 Balanced designs for experiments involving sequences of treatments 


It is worth noting that the existence of such designs implies the existence of a v x 4(v + 1) 
and hence also av x $(v— 1) Youden square for prime v of the form 4n + 3, a result previously 
obtained by Bose (1939). 


12. THE DESIGN FOR v = 13, k=5, b/v=3 


A design is available for 13 treatments and 5 periods requiring 39 units. Let the treatments 
be represented by the elements of the field of 13 elements. The differences in the three 
rectangles are 


6, of 
6, #2 # 
b5 * - 


The primitive root is x = 2, so that 1+z=<24, l+a+a2%=2", l+a4+2?+23=2. Each 
non-zero element is included once in a, and using the above relationships and 2! = 1, 

= — 1, we find that each non-zero element is included once in y and five times in £ and 
— £ together. The differences giving a balanced design are therefore: 


Ah Oe 
& 266 
6 41210 
iS 90.7 


13. FURTHER DESIGNS FOR ODD NUMBERS OF PERIODS 


It is not difficult to extend the series of three-period designs for values of v which are primes 
or powers of primes of the form 4n + 3 to other odd numbers of periods. As before represent 
the treatments by the elements of a field of v elements. The method of construction is to 
write down sets of differences satisfying certain conditions: 

(1) The first set should contain even and odd powers of x, the primitive element, equally 
often. 

(2) The cumulative sums in the first set, starting with the last difference, should contain 
even and odd powers of x equally often. 

(3) The leading sequence obtainable from the differences in the first set contains no 
treatment more than once. 

(4) The remaining sets of differences, making a total of }(v—1), are obtained by 
multiplying the elements of the first set by each of the even powers of z in turn. 

Such a method is similar to that used in constructing designs from sets of orthogonal 
Latin squares, the difference being that u, takes values from only half the non-zero elements 
of the field. 

It can easily be shown that such sets of differences lead to a balanced design for an odd 
number of periods. The proof depends on the property that for the values of v under con- 
sideration the inverses (with respect to addition) of the even powers of x are odd powers 
of x. If there are v— 1 differences in each set }(v— 1) mutually orthogonal Latin squares are 
obtained. In this case conditions VI and VII are automatically satisfied so that there is no 
need to consider the cumulative sums. 


f 
' 
f 


i 





— se 


S 
it 


eee 





H. D. Patterson 47 


Four examples are given below. These are such that the symbols of each three successive 
periods themselves define a balanced three-period design. In each case the first set of 
differences is given, the remaining sets being obtained by multiplying the differences of the 
first set by the quadratic residues other than 1: 


(1) v=7,k =5,b/v=3 (5,1,3,2) 
quadratic residues of 7: 1, 2, 4 
(2) v= 11,k=5,6/v=5 (1,8,5,7) 
(3) v=11,k=7,b/v=5 (1,2,5,7,3,6) 
(4) v=11,k=9,b/v=5 (1,8,5,7,3, 2, 4, 10) 
quadratic residues of 11: 1,3, 4, 5, 9. 


14. GENERAL REMARKS 


Referring to simple change-over trials Cochran et al. (1941) have shown that there is some 
advantage in using blocks of cows, each block corresponding to a single square. This 
procedure may often be useful with other types of experimental material. Most of the 
designs of this paper, including all those found by the method of differences, can be arranged 
in blocks of v units. 

In the cases in which the balanced designs with v = k = 3,b/v = 20rv=k=4,b/v=1 
are superimposed on each block of a balanced incomplete block arrangement the size of 
blocks can always be equal to k. Thus the design v = 13, k = 6/v = 4 can be arranged in 
blocks of 4 units. Some of these designs can also be arranged in blocks of v units, but 
there are exceptions. For example, the arrangement for v = 10, k = 4, b/v = 6 is based on a 
non-resolvable incomplete block design and cannot be arranged in six rectangles. 

A word of warning is necessary on the analysis connected with the designs of this paper. 
The straightforward analysis following from the least squares method of fitting constants 
including constants for residual effects is not appropriate under all systems of correlations 
between experimental error terms. In practice, with certain types of material, it seems 
likely that no serious error is made if this point is ignored. A discussion of the analysis and 
of advantages and disadvantages obtained in using change-over trials is given by Patterson 
(1950, 1951). 

There is also a large number of partially balanced designs. The analysis is rather more 
complicated. It is not proposed to discuss these designs here. 


15. DESIGNS BALANCED FOR FIRST AND SECOND RESIDUAL EFFECTS 


A design is balanced for first and second residual effects if, in addition to conditions I-VII, 
the arrangement satisfies 

VIII. Each treatment is preceded, two periods earlier, by each other treatment equally 
often. 

IX. Each ordered pair of treatments occurs at the end of an equal number of sequences. 

X. In those sequences in which a given treatment occurs in the last but one period the 
other treatments occur equally often. 

XI. In those sequences in which a given treatment occurs in any but the last but one 
period the other treatments occur equally often in the last but one period. 








48 Balanced designs for experiments involving sequences of treatments 
The restrictions on b/v are, from conditions IV and VIII, 
b(k—1)/v=0 (modv—}), 
b(k-—2)/v=0 (modv—1}), 
so that by subtracting bjv=0 (modv—-1). 


It follows that b/v cannot be less than v— 1. Condition IX, which is less obvious, implies the 
same thing. 

If a design with v = k satisfies conditions I and II, then conditions V-VII, X, XI are 
automatically satisfied. Williams (1950) has shown that the construction of balanced 
designs in the cases v = k is related to the problem of finding }[{v(v—1)] arrangements of 
v+1 objects in a circle such that no object has the same pair of neighbours in any two 
circles. Solutions to this problem exist for general values of v. 

Designs balanced for pairs of residual effects will not be considered further in this paper. 
It should be noted, as remarked in §7, that solutions are provided for v prime or a power of 
a prime and any k such that v > k > 3 by designs involving rectangles from complete sets of 
orthogonal Latin squares. 


SUMMARY 


This paper deals with the construction of designs for experiments in which sequences of 
treatments are applied to the same experimental units. These designs have certain properties 
of balance which facilitate the estimation of the effects of treatments and the errors. Designs 
with 3, 4, 5 and 6 periods and requiring 60 or fewer units for the case in which the effects 
considered are the direct and residual effects receive most consideration. A number of 
designs outside this range are included. There are still a number of designs which have not 
been found or proved non-existent, particularly for non-prime numbers of treatments. 


I am indebted to Dr E. M. Patterson for useful suggestions on the mathematics of the 
section concerned with the designs for prime v of the form 4n + 3, and a minimum value of 
b/v equal to 2. 


REFERENCES 


Bose, R. C. (1939). On the construction of balanced incomplete block designs. Ann. Eugen., Lond., 
9, 353. 

Bose, R. C. & Narr, K. R. (1941). Gn complete sets of Latin squares. Sankhyd, 5, 361. 

Cocuran, W. G., AuTREY, K. M. & Cannon,C. Y. (1941). A double change-over design for dairy cattle 
feeding experiments. J. Dairy Sci. 24, 937. 

Fisuer, R. A. & YATES, F. (1948). Statistical Tables for Biological, Agricultural and Medical Research. 
Edinburgh: Oliver and Boyd. 

Patrerson, H. D. (1950). The analysis of change-over trials. J. Agric. Sci. 40, 375. 

Patterson, H. D. (1951). Change-over trials. J.R. Statist. Soc. (in the Press). 

StEvENS, W. L. (1939). The completely orthogonalized Latin square. Ann. Eugen., Lond., 9, 82. 

Wituiams, E. J. (1949). Experimental designs balanced for the estimation of residual effects of 
treatments. Aust. J. Sci. Res. A, 2, 149. 

WituraMs, E. J. (1950). Experimental designs balanced for pairs of residual effects. Aust. J. Sci. 
Res. A, 3, 351. 

Yates, F. (1936). Incomplete randomized blocks. Ann. Eugen., Lond., 7, 121. 











of 











{ 49 ] 


MULTI-FACTOR DESIGNS OF FIRST ORDER 


G. E. P. BOX 


Imperial Chemical Industries, Dyestuffs Division Headquarters, 
Blackley, Manchester 


The problem discussed arises when it is possible to choose in advance the N combinations of levels at 
which a set of quantitative factors are to be held in a set of N experiments to determine the slopes of 
a regression surface (assumed planar). It is shown that the minimum variance property of an ‘optimum’ 
design arises from the shape of the design pattern and is independent of its orientation. This fact may 
be utilized as follows: 

(1) When prior knowledge of the response surface exists the design may be rotated to reduce 
possible bias. 

(2) The design may be rotated so that systematic effects, such as polynomial time trends and block 
effects, are eliminated without loss of efticiency. 

(3) Subject to the conditions imposed by (2), the orientation of the design may be chosen at 
random. This has the effect of making the usual normal theory tests exact and completely independent 
of the distribution of the observations. 


1. Suppose that the effect of k quantitative variables or factors X,, ..., X;,...,X, (such 
as time, temperature, concentration) on some measurable response (such as yield of product) 
is being studied in a region of the response surface that can be represented to a sufficient 
degree of accuracy by a polynomial equation of degree d. We define a design of order d as 
an arrangement of experiments which will allow all the coefficients in this polynomial to be 
separately determined. In this paper it is assumed that d is equal to 1, i.e. that a planar 
approximation is adequate. It is also assumed that the variables can be controlled exactly at 
levels decided in advance and that the observed response y differs from 7 due to experimental 


error having variance o?: Ey) =, E(y—7)? = 0°. (1) 


We can perform N > k trials; the problem is to decide which N combinations of levels to use 
so that the constants defining the plane are estimated with maximum accuracy. We shall 
not apply the usual limitation that the design is to consist of combinations of a few fixed 
levels of the factors, but as a means of specifying the extent of variation for a given factor 
N (X, —X,)?)+ 
X;, we define the unit S; for this variable as S; = + Cue 
u=1 
terms of the standardized variables x;, = (X;,,—X;,)/S;. It will be noted, therefore, that 
for the standardized variables 


, and write the design in 


N 
Uru = 9, (2) 
N 
¥ 23, = N. (3) 


A discussion of the problem of scaling and comparing experimental designs will be found 
in a recent paper (Box & Wilson, 1951), where there is an account of the planning of 
experiments to attain maxima in connexion with which this investigation was undertaken. 
The design matrix D is an N xk matrix providing a programme of experiments to be 
performed. The k elements 2},,, ..., Zjy, ---» Cj, Of the wth row are the levels of the standardized 
variables to be used in the uth trial. They can also be regarded as defining the k co-ordinates 
of the uth experimental point in the k-dimensional factor space. To use the design the 


Biometrika 39 4 











50 | Multi-factor designs of first order 


experimenter must decide on suitable average levels X,, X,,..., X, and units S,, S,, ..., 8; 
for the variables. The level to be used for the ith variable in the wth trial will then be 
Kye = X,+ 8,24 


2. Suppose the true regression plane in the region considered is 


9 = Bot Pity +--- + Pedy. (4) 

In an obvious matrix notation the N equations (4) at the N experimental points may be 
itten 

= n = X,B,, (5) 


where X, = [U : D] and each element of the column vector U is unity. If Y is a column 
vector of observations 4, ...,Y,,;---,Y¥y made at these experimental points, then providing 
X, is of rank k + 1 (which implies that D is of rank k), separate linear estimates bo, b,, ..., 5, 
of each of the f’s may be calculated. For a particular design D, linear estimates having 
smallest variances are provided by the method of least squares and are given by 
B, = (X; X,)~! X, Y = T, Y, and it is well known that the matrix of variances and covariances 
for the estimates is (X,X,)~!0?. We have then to choose D so that the diagonal elements of 
(X; X,)~? are minimized. 

Consider the symmetrical determinant of sums of squares and products A = |c,;| = |X, X,|, 


and denote by C;,, the cofactor of c,,, in A and by C;; ,, the cofactor of c;; in C,,. Using 
Cauchy’s expansion we have 


A = Cy, Cy, —Q, (6) 
where Q is a quadratic form in the k variables c,; (i = 0,1,...,h—1,4+1,...,4) and 
Q = X Creer Cry rn (7) 
ith 


Now since X, is of rank k + 1, Q is necessarily positive definite and A is positive. Alsoc,, = N 
(from (3)) and V(b,) = oC;,,,/A, where V(6,) is the variance of 6,. Consequently, re- 
arranging (6), 

V(b,) = N-10*{1 + A! (positive definite quadratic form in the c,,;)}. 


Thus V(b,) is a minimum only when each of the c,,; (the k sums of products of the Ath 
variable with each of the remaining variables) is zero. When this is so all the C,,; must be 
zero also, and consequently 6, is uncorrelated with each of the other estimates and has 
variance o*/N. For maximum efficiency for all the coefficients then {c,;} = NI and a suitable 
design D is supplied by any k columns after the first, of a matrix N*O, where O is orthogonal 
with the elements of its first column all equal. This result was arrived at by Plackett 
& Burman (1946). They postulated, however, that the z’s in their first-order optimum 
designs should take only the values +1 and — 1. With this limitation designs existed only 
when N was a multiple of 4 and they obtained arrangements for k = 3, 7, 11, ...,99 factors 
using N = 4, 8, 12,..., 100 trials. Such designs may be used with qualitative or quantitative 
variables; in our case where the variables are essentially quantitative this restriction is 
not introduced and N can have any value. 


3. For our problem, therefore, designs of optimum precision for up to k = N —1 factors 
in N experiments may be obtained from any orthogonal matrix O with elements in the first 
column all equal and X, = N*O. We now assume k = N —1 and consider the geometrical 
implications of the above result. Since D is of rank k = N —1, the N experimental points 
are the vertices of an N — 1 dimensional simplex. Write the uth rowof X, as xj, and denote 





———e 











———e 








G. E. P. Box 51 
the angle which the uth and sth experimental points make with the origin by 6,,,, then since 


us? 


X, = [U : D] the distance of each experimental point from the origin is (N — 1) and 
x, X, = 0 = 14+(N—-1)cos8.,,, (8) 
i.e. cos6,, = —(N-—1)- (all wand s, u+3s). (9) 


Consequently this design is formed by the vertices of the regular N — 1-dimensional simplex. 
If two factors are tested in three trials the experimental points should be at the vertices of 
an equilateral triangle; for three factors tested in four trials the experimental points should 
form a regular tetrahedron and so on. It should be noted that no restriction is necessary on 
the orientation of the design. We can turn the regular figure in any direction; this will 
correspond simply to a different choice of the orthogonal matrix O, and the variance 
covariance matrix for the 6’s will remain unchanged. 

As an example, Fig. 1 shows two particular orientations of the optimum design for 
N=4,k=3. 


Lf) 


A , 
2 7 


cam. 


= 
> 
a 
~ 





IF wae bic 10 Z! > x, 
iY Fiamens 7 ‘ 74 ; ns 
ny / 

‘ ‘er a 


x 


Fig. 1. Orientations of the optimum design for N= 4, k=3. 


The design matrices are 


Z Ly ws Ly Xe a) 
(l)}-1 -1 1 (lj)}-21° -1 -!1 
_(2)} 1 -1 -1 _@)} 2-2-2] F 
oul: Cae ee teat. 2 2 -1 ie 
al ie ie (4) : 3 


(V2) (¥2//3) (1/V/3) 


D, is the familiar half-replicate of the 2° factorial; the other half replicate is obtained by 
rotation of the first and completes the cube. D, is also obtained by orthogonal rotation of 
D,,, so that the line joining the points (1) and (2) is parallel to the axis of x, and (1), (2) and 
(3) fall on a plane parallel to the plane of x, and z,. D, is seen to have elements proportional 
to Helmert’s orthogonal matrix (for clarity the elements are given as whole numbers with 
the necessary multiplier shown below). This latter design has rather a curious property, 
for it is a ‘one factor at a time design’ although not of the orthodox pattern. To use it the 
experimenter would first perform a ‘blank’ experiment with all factors at the lower levels; 
in the second experiment the level of the first factor only would be changed; in all subsequent 


4-2 











52 | Multi-factor designs of first order 


experiments this would then be held at the average of these two levels. In the third experi- 
ment the level of a second factor would be raised, and in all subsequent experiments this 
factor would be held at the average level of the three experiments. This procedure could be 
continued for any number of experiments and factors. The estimates would be uncorrelated, 
and on the convention we have adopted concerning the units for the factors, the variance 
of the estimates would be the same with both designs. 


4. Although the variances and covariances under orthogonal rotation of the design 
remain constant, the magnitude and arrangement of the possible biases which might occur 
if the planar approximation was inadequate do not. Suppose that, contrary to assumption, 
to obtain a perfect fit it was necessary to include S extra terms X,B,., so that instead of 
(5) we had 


n = XB, + X2B,, (11) 
then (Box & Wilson, 1951) B, would no longer supply unbiased estimates of B, but instead 
E(B,) = B, + AB,, (12) 


where A is a (k + 1) x S matrix of coefficients of the biases called the alias matrix and given 
by A = (X;X,)-!X|X,. With the orthogonal designs discussed here this simplifies to 
A = N-'X{X,. In judging first-order designs we shall consider possible biases due to terms 
of second order. Now there are two varieties of second-order terms: those which are 
coefficients of square terms 23, 23, etc., sometimes called quadratic effects, and those which 
are the coefficients of product terms 2,22, 7,23, etc., sometimes called linear x linear inter- 
actions. In what follows it is mathematically convenient to define the effect £,, as the 
coefficient of x? whilst £,, is defined as the coefficient of x,2,./2. Equation (11) may then 
be written 


9 = Bot ByX, +... + By tyt+ By 2it .-. + Bent + Bre(%1%_J2) +... + Byrn (Ze-1%e V2), (13) 


and the matrices of bias coefficients corresponding to D, and D, are found to be 


11 22 33 12 13 23 11 22 33 12 13-23 
ory t 4 Sues 0 l : i ° ° 
* er eae 2A a as oe 
A, = € ») : A, +0 € € 4 6 . (14) 
Ok cratic ball a o2o* wae . —f2] (1/73) 
3 raft Stu Tks S--. ee 


The figures in brackets are multipliers of the rows of A,. Using D,, for example, the expected 
value of 6; when second-order terms were not all zero would be 


E(bs) = B3— (B11 + Bee — 2P93)/ V3. 


5. Ifwe could, we would choose to orient the optimum design so that the bias coefficients 
were as small as possible. In this way both random and systematic errors might be 
simultaneously minimized. Consider the (k + 1) x (4+ 1) matrix AA’. The sums of squares 
of bias coefficients for the k +1 estimates bo, b,, ...,b, are given by the diagonal elements, 
and the magnitude of these would provide one indication of the efficacy of any particular 
orientation. We find somewhat unexpectedly, however, that for these optimum first-order 
designs AA’ is invariant for any orthogonal rotation of the design. This is proved as follows. 














G. E. P. Box 53 


Denote by Xj, ..., Xj, -:-, Xv the N rows of the matrix X, and by X, a matrix whose N rows 
are (x;)!I,..., (x;,), ..., (Xjy)@ the derived power vectors of degree 2 (Aitken, 1948). Then 
X, = (U: D2: X,] and N-*X|X,X;X, = AA’+J, where J is a diagonal matrix in 
which the first diagonal element is 1 and each of the remaining k diagonal elements is 2. 
Now suppose the design is submitted to orthogonal rotation and denote the new matrices 
by D, X,,X,and A. Then D = DG, where G is some k x k orthogonal matrix, and X, = X,H, 
where H is a (K+ 1)x(k+1) orthogonal matrix consisting of G bordered by a first row 
r’ = (100... 0) and a first column r. Now H transforms the vector x{,; denote by H"®! the 
matrix which correspondingly transforms the vector (x;,)!). 


Then AA’+J=N —H’X, X, H/H’X; X, H, (15) 
and since H is orthogonal so is H!, Now the jth diagonal element of X, X; is 
(xj) (xc,)) = (xjx,)* = V2 


and the ijth non-diagonal element is (x;)!*!(x,)*! = (x;x,)* = 0. Consequently the right-hand 
side of (15) reduces to NI. We find in consequence that AA’ = NI—J, whatever the 
orientation of the design. That is, the sum of squares of the coefficient of the biases for by is 
N —1, and for each of the effects 5, ... b, it is N — 2. The result is of course only true for the 
particular relative weighting of quadratic and interaction terms which has been adopted. 
This relative weighting is, however, a reasonable one, and the important conclusion emerges 
that if we have no prior knowledge concerning the relative importance of particular second- 
order terms no arrangements which are dramatically worse or better than others can be 
expected to arise as a result of rotation of the designs. In particular, if k = N —1, it is not 
possible to keep a selected estimate clear of bias. 


6. When, on the other hand, something is known of the type of approximating second- 
degree equation to be expected it might be possible to reduce bias by suitable rotation of 
the design. Consider a particular class of designs which are such that only by is biased by 
quadratic terms. For this to happen each of the N — 1 column vectors in X, corresponding 
to first-order effects must have zero inner product with the N—1 column vectors in X, 
corresponding to quadratic effects. This can only happen if the latter have all elements 
equal to +1 which in turn implies that elements in D consist entirely of + 1’s and —1’s. 
These designs are those obtained by Plackett & Burman. Now the response contours 
generated by the second degree approximating equation are a set of conics. Suppose the 
direction of the principal axes of the system were known. Then because of the property 
mentioned above, if any of the designs of Plackett & Burman were rotated so that their 
axes were parallel to these principal axes the effects 6,, ..., b, would be unbiased, since in the 
new variables the second-degree equation contains no product terms. This may have 
practical application in the exploration of ‘ridge’ systems. Such systems occur, for example, 
when a line or plane of near maxima rather than a single point maximum is found. The 
probable existence and direction of such systems can sometimes be deduced from theoretical 
considerations, in which case it might be an advantage to rotate axes of the design so as to be 
parallel to these suspected ‘ridges’. 

It is worth noting (Box & Wilson, 1951) that by replication of any design with change of 
signs a design of ‘Type B’ is obtained. That is to say, one in which the first-order estimates 
are unbiased by terms of second order. This of course applies to all the designs discussed here. 








54 | Multi-factor designs of first order 


ELIMINATION OF SYSTEMATIC VARIATION 


7. If the planar approximation is adequate we have seen that first-order designs of 
maximum efficiency are obtained by writing down any k mutually orthogonal column 
vectors each containing N elements and each orthogonal to U, a column vector of unit 
elements. The latter requirement allows for the elimination of the mean. Now it is easy to 
show that if any p column vectors are arbitrarily taken we can always write down further 
N-—p column vectors which are mutually orthogonal and each of which is orthogonal to 
each of the original p vectors. If therefore a first-order design is required for k variables 
which is such that, not only the mean but also a number of other systematic effects, such as 
time trends and block effects, are to be eliminated, this may be done by choosing the 
k orthogonal vectors of the design such that they are orthogonal to the mean and also to 
a set of p—1<N-—k-—1 vectors representing these additional systematic effects. 

A field in which there is a need for designs of this sort, and in which the extra trouble in 
the design and execution of the experiment is amply justified, is that of large-scale plant 
experiments. Here to avoid the possibility of serious loss of purity or yield, only small 
changes in the factor levels could usually be tolerated. This would tend to ensure the 
relative predominance of first-order effects, but a sensitive experiment would be required 
to detect such effects which would be necessarily small in magnitude. Furthermore, daily 
alterations to plant conditions would usually not be practicable. On some types of process 
if a series of trials were to be made it would be necessary to run each for a week at least. 
This replication would reduce random errors, but long-term trends which are of common 
occurrence would not be reduced. The usual way of overcoming this difficulty, of course, 
would be to make comparisons within blocks of weeks, using partial confounding if necessary 
to reduce block size. An interesting alternative procedure employs the principle discussed 
above. 

Fig. 2 shows a set of fourteen consecutive weekly averages for a certain quality 
characteristic; it is obvious that a systematic trend with time typical of this particular 
process is occurring in the individual results. The following analysis is obtained by fitting 
orthogonal polynomials up to thirteenth order: 

















Order Sum of Order Sum of 
of term squares of term squares 
1 0-08 ll 1-77 
2 45-00 12 0-68 
3 8-56 13 0-58 
4 2-11 
5 0-80 oF Reale 
Total 65-25 
6 0-87 
7 0-13 
: Terms upto 56-55 
10 0-61 fifth order 
Remainder 8-70 

















(A useful table giving the orthogonal polynomials for equally spaced ordinates and N < 26 
has recently been published by De Lury (1950).) 








a 





a EEE ~~ 











G. E. P. Box 55 


Analysis of other sets of data from this process covering periods of the same length showed 
that terms up to fifth order usually accounted for a large proportion of the variation. 
Suppose now that it was desired to plan a set of fourteen trials on this process each lasting 
a week, by means of which the first-order effects of three quantitative factors were to be 
determined with maximum precision and the trend effect eliminated. On the assumption 
that the ‘trend’ could be represented by a polynomial of fifth degree plus independently 
distributed error terms, the levels in the 14 x 3 design matrix could be taken with column 
elements proportional to any three of the orthogonal polynomials higher than the fifth order 
or to any three orthogonal linear combinations of these; the remaining five degrees of 
freedom corresponding to high-order polynomial effeets not used up would supply an 
internal estimate of experimental error appropriate for calculating confidence limits for the 
effects. 

52 


Sok 3° 
48 + fe) ° 


46+ ° ° 


Average value 


44 


42+ ° 








Week 
Fig. 2. Fourteen consecutive weekly averages for a quality characteristic. 


The assumption that the trend would be represented by a polynomial on time plus 
independently distributed error terms and consequently that the residual variation would 
be distributed evenly through the higher order terms would often not be justifiable however. 
In particular, negative correlation between successive errors which might occur due to 
incomplete segregation of individual batches could lead to inflated higher order terms. This 
difficulty may be overcome by the device of randomization. In fact, we choose the directions 
of the three design vectors in the eight-dimensional residual subspace entirely at random 
subject only to the restriction that they should be mutually orthogonal. ‘ Angular’ random- 
ization of this sort is achieved conveniently by using a table of random normal deviates 
such as that of Wold (1948). Any 7 values of such a table provides a vector z, having equal 
probability of lying in any direction in the n-space. A second vector Z, , whose direction 
is random, subject only to the restriction that it is orthogonal to the first, may now be con- 
structed by choosing a second set of values Z, from the table and calculating 


Zo.1 = Zg—be 1%, (15) 
where be.4 = 2122/2172). 
In a similar way a third vector Z, ,, orthogonal to the first two may be constructed from 
Z3.21 = Z3—b3 91%2.1— 53.121) (16) 
where bs.o1 = 232%e.1/22.122., and bs , = 232,/Z12, 


and so on. Having obtained Z,,Z, ,Z3.9;, etc., it is convenient to standardize them by 
dividing the elements of each vector by the square root of the sum of squares of its elements. 








56 | Multi-factor designs of first order 


These standardized vectors may be denoted by Z,, Z, , Zs 9;, etc.; their elements are the 
direction cosines of lines which are mutually perpendicular but whose orientation in the 
n-space is otherwise random. 

In the example quoted the three sets of eight random normal deviates taken from the 
table were 


Zz, = [—0-69 1-40 1-79 —0-83 0-34 0-19 1-46 0-21), 

Z,=[ 0-73 -—1-00 0-81 0-01 -—1-:13 -—1-53 -—0-07 —0-22], 

Z;=[-—0-50 —1-59 0-82 0-71 1-37 0-90 0-74 —0-83]. 
From which were obtained 

Z, =[—0-235 0-476 0-608 —0-282 0-116 0-065 0-496 0-071], 

Zz.1 =[ 0-265 -—0-332 0-453 -—0-046 -—0-453 -—0-630 0-063 -—0-080], 

Zs 25 = [—0-133 -—0-645 0-310 0-270 0-429 0-241 0-241 —0-313]. 
These were taken as the direction cosines of the three design vectors in an eight-dimensional 
subspace whose axes of reference in the fourteen-dimensional sample space were the eight 
vectors given by the elements of orthogonal polynomials of order 6, 7, ..., 13. 

The design vectors in the original co-ordinate system were then obtained by calculating 
linear combinations of the orthogonal polynomials using the elements of Z,,Z, , and Z, 9) 
as coefficients. Thus the first vector is given by 

X, = — 0-235E, + 0-476E, +... +0-071E,5, (17) 


where &, is the vector of elements of the sth orthogonal polynomial standardized so that 
5.5, 14. The design* thus obtained is given below. 











Factor levels 
Observation 
a | a | 3 
1 —0-2 0:5 0-3 
2 0-1 —155 -—1-1 
3 1-2 1-0 1-1 
4 —1-6 0-5 0-2 
5 —0-4 0-5 0-0 
6 0-0 -- 16 —1-5 
7 1-6 —01 0-5 
8 0-6 1-4 0-0 
9 —2-0 —0-2 1-4 
10 0-1 —1-4 0-2 
ll -—01 1-7 —2-3 
12 1-3 —0-7 1-1 
13 -—0-9 —01 0-4 
14 0-2 0-1 —0°3 




















To use the design the three factor levels are varied in proportion to these design elements 
in suitable units S; in the manner described in § i. Any trend occurring during the course of 
the experiment which can be represented by a polynomial of up to fifth degree may be 


* Only a single place of decimals is given. More accurate values could of course be used if it 
were possible to control the levels of the factors more precisely. 








Vs 











G. E. P. Box 57 


eliminated without loss of efficiency. The significance of the effects may be conveniently 
assessed by means of the analysis of variance: 











Source ana vt 
Due to fifth-order polynomial 5 
om 1 
Xe 1 
Ls 1 
Residual (error) 5 
Total 13 














Since the normal theory significance tests depend only on the angles between the vector 
of observations and the design vectors, angular randomization ensures that these tests 
are exact whatever the nature of the residual variation. The use of these designs is 
therefore justified whenever the procedure is likely to reduce the residual variation. Because 
of randomization no assumption need now be made that the trend can be accurately 
represented by the formal mathematical model. 


8. In general, when quantitative factors are not restricted to a few fixed levels the scope 
and flexibility of experimental design is greatly increased. It is hoped to show in a later 
paper how these ideas may be applied with designs of higher order, and to discuss in more 
detail the interesting implications of ‘angular’ randomization. 


The drawings were executed by Mr P. S. Ward, to whom I wish to express my best thanks. 


REFERENCES 


AITKEN, A. C. (1948). Determinants and Matricés, 5th ed. Edinburgh and London: Oliver and Boyd. 
Box, G. E. P. & Witson, K. B. (1951). J.R. Statist. Soc., Series B, 13, 1. 


De Lury, D. B. (1950). Values of the Integrals of the Orthogonal Polynomials up to n = 26. University 
of Toronto Press. 


PuiackeEtTt, R. L. & Burman, J. P. (1946). Biometrika, 33, 305. 
Wo p, H. (1948). Tracts for Computors, no. 25. Cambridge University Press. 











[ 58 ] 


TESTS OF SIGNIFICANCE IN CANONICAL ANALYSIS 


By F. H. C. MARRIOTT 
University of Aberdeen 


1, INTRODUCTION 

Since the earliest papers on multivariate analysis, some twenty years ago, great advances 
have been made in the techniques used in dealing with multiple measurements. However, 
there are still considerable difficulties in applying the theory in practical cases. In the 
simplest case, discrimination between two groups, the problem reduces to one of multiple 
regression, but in the more general cases of canonical correlation between two sets of 
variables and discrimination between more than two groups further problems arise which 
have not yet been completely investigated. 

Suppose it is desired to investigate the relationship between two sets of variables 2, ... X, 
and y, ... y, measured on n + 1 individuals. The correlations between linear functions of the 
zx’s and y’s will be considered; there is consequently a complete duality between the two, 
and there is no loss of generality in taking p<g. In special cases, either the z’s or the y’s 
may be dummy variables corresponding to a division into p+ 1 or g+ 1 groups. 

Now the dispersion matrix (the matrix of corrected sums of squares and products) of the 
y's, say T, with elements 7;,, may be broken up into two parts: W is the dispersion matrix 
of the y’s with the 2’s eliminated, i.e. the matrix of residual sums of squares and products 
of the y’s after a regression on the x’s has been taken out, and Q = T— W the dispersion 
matrix due to regression. If any linear function of the y’s, say 6;y,; (using the summation 
convention), is considered, an analysis of variance may be carried out on it thus: 





Degrees of freedom Sum of squares 














Regression on the x’s Pp b,b;Qis 
Residual n—p 495 Was 
Total n bb; Ty; 











; . 66,0, n- 
Of the possible vectors b, one gives a maximum value to the variance ratio ree x -s : 
495M 


and this vector satisfies the q differential equations 0 [bjQuy| _ O(h = 1,...,9q), 
0b, b; 6,7; 


i.e. b.Qni(b,b; Ts) — 5, Th (b,5,Q;;) = 9, 
a 6,b6,Q 
or, writin Bate, 
8 b,6,7;,; 


b(Q-PT)=0, |Q-PT| =0. 
Now, except in trivial cases, T is non-singular and Q is of rank p. Consequently, there are 
just p distinct non-zero values of /?, the latent roots of T-1Q, and p corresponding latent 
vectors b. The p positive values of |, arranged in descending order of magnitude /, ...1,, are 
known as the first, second, ..., canonical correlations between x and y, and the corresponding 


ee 





vs 


ire 
nt 
are 
ng 


ee 





F. H. C. Marriott 59 


functions b;y; as the first, second, ..., discriminant functions. The l’s are estimates of the 
parent canonical correlations A, ... A,. The quantity 
L= 0-9-2) 
i=1 7" band 
has been proposed by Wilks (1932) as a measurement of the degree of association between 
the z’s and the y’s. L is small when one or more of the /’s is close to 1, and is nearly 1 when 
there is little association between the z’s and y’s. 


2. TESTS OF SIGNIFICANCE 


When p = 1, the significance of L may obviously be tested by a variance ratio test. 
(1—L) (n—q)/(Lq) is distributed as a variance ratio with g and n —q degrees of freedom. The 
exact test when p = 2 was given by Pearson & Wilks (1933). (1—./Z) (n—q—1)/(qVL) is 
distributed as a variance ratio with 2q and 2(m—q— 1) degrees of freedom. 

When p > 2, no exact test is available. The approximate test x74, = —” log, L is derivable 
from the theory of likelihood ratio tests, and Bartlett (1938) has obtained from the moments 
of log, L the more accurate test 


Xipa = —{n—4(p +9 + 1)}log, L. 
This test is sufficiently accurate for most purposes, but Rao (1948) has established a refine- 
ment of the test whereby the probability of an observed L is obtained as the sum of an 
asymptotic series. “or all practical purposes, Rao’s test may be regarded as an exact test 
of significance for L. 
Further, if A, +0 and 1, is an estimate of it, the association between the 2’s and the y’s, 


apart from the first canonical correlation A,, may be measured by L’ = tl (1-22). Bartlett 
i=2 
(1938) has shown that 


Xiin-na-m = —{n-4(pt+q+ 1)}log, L’ 
provides an approximate test of significance for L’. Similarly, the association can be tested 
when more than one non-zero canonical correlation has been eliminated. 

If the alternative hypothesis to A,...A, = 0 is A,+0, A,...A, = 0, L does not provide 
an efficient test of independence. A test of significance for 1,, the largest sample canonical 
correlation, is required. Similarly, if /, is obviously significant, a test of significance for 
1, may be required. These tests are not at present available, and it is the purpose of this 
paper to provide them in some cases 

The problem of finding most efficient tests in this instance is complicated by the fact that 
i, does not necessarily correspond to A,. The estimated discriminant function associated 
with /, may not be an estimate of the best discriminant function in the population. 


3. THE DISTRIBUTION OF THE GREATEST CANONICAL CORRELATION COEFFICIENT 


The joint distribution of the canonical correlations in the null case has been given by Fisher 
(1939) and others. Putting wu; = /?, the distribution may be written 


dF =CTi (ue —u) Tl (u,—u,) du, 
i=1 j=tt+l1 





where a=3(q-p+l), b=3(n—-q-—p+l), 
me I'[}(n—i+1)] 
vir m? II ireaas 1} Tis(n—q—t+ LITER p-it+ mt 











60 | Tests of significance in canonical analysis 


Three possible ways of finding a test of significance for 1, suggest themselves: 
(1) By integrating out /,...1, the distribution function of 1, can be obtained. The 
probability that w, > U, is given by 


Pal Lf ferduy day 


In the general case this integral is not easy to evaluate. However, as will be shown later, 
it is possible to obtain the integral in certain special cases, and in other cases it is possible 
to derive a significance test, even though the complete integral is not found. 

(2) The x? tests for L suggest that a similar approximate test might be derived for 1,. 
Xing = —{n—-4(p + q+ 1)} log, (1 — 22) obviously provides an upper limit for the significance 
point. Further, consider the approximate break-up of the y? for testing L. 











x? Degrees of freedom 
— {n—4(p+q+ 1)} log, (1-7) p+q-1 
— (n= Hp+q+1)} log, HT (1m) (p-1) (q-1) 
— {n-3(p+q+1)} log, L Pq 














The second component of x? tests L’ and the total x? tests L. The first component does not 

provide a test for /,, since the break-up is only valid if A, +0. If A, = 0, the selection of the 

largest / makes the first component a biased test, giving a lower limit for the significance point. 
These two tests suggest a possible test 


Xin = — {n—4(p ++ l}log, (1-H), 
where D is a symmetrical function of p and q, lying between p + q— 1 and pq, and therefore 
equal to g when p = 1. Such a test will be discussed in §6 of this paper. 
(3) In view of the difficulty of obtaining the exact distribution, Bartlett (1941) has 
proposed a rather different test based on the distribution of /, given /,...1,: 


1 
1 WA m4) TH (1-0) dey 
Pp a2u i= 





1 p i 
| u{-1(1—u,)?-! [J (u, —U;) du, 
U; i=2 


This test may be written in the form 





_ OU;) 
P= 40,)’ 
where 
6(U) = Bla + p, b) - By(a+p,b)— 3, Uy{Bla+p— 1,b)— By(a+ p— 1, )}+... 
U 
and By (a, n= x2-1(1—2)°-1 dz. 


The use of this test is open to various objections. First, the test is almost certainly less 
powerful, although the fact that /, is not necessarily an estimate of A, means that neither 
is uniformly most powerful. Secondly, this test is tedious and difficult to apply, and cannot 
be used to tabulate significance levels. Thirdly, if J, is close to /,, the test may not give 











ot 
he 
it. 


re 


as 


ess 
her 
106 
ive 











F. H. C. Marriorr 61 


significance even to a high value of /,. This is not such a serious fault as might at first appear; 
the chief use of the test is when L is not significant, and if two of the /’s are large, this will 
usually show up in the test of L. 


4, TESTS OF SIGNIFICANCE FOR 1,* 
In certain cases it is possible to obtain the exact integral to test the significance of /,. For 


example, if p = 2, P =1-Iy (2a, 2b) +, Bo, (4, b) Uf{(1- U,)? 
1 








2.B(2a, 2b) j 
where a and 6 are defined as above, i.e. a = 4(q—p+1), b = 4(n—q—p+]), and 
By (a, 6) 
I, = <0 : 
vu, (4, b) Bia, b) 


In particular, when p = 2, q = 3, 
P = (264+1)U,(1-—U,)°+(1-U,)?5+1 
and when p = 2, q = 5, 
P = }(2b +1) (26+ 3) UX(1—U,) + 4(1 — U,) + {6(2 + 1) U2 + 3(2b + 1) U, + 3}. 
When P is small, i.e. near the significance levels, terms of high order in 1—U, may be 
neglected and reduced approximate forms are found. These approximate tests should be 
used only when P is small, but the error in using them near the 5% point is quite negligible 
(generally of the order of 0-00002 in 0-05, or 0-04%). The approximate tests are 
ai, — b 
p=2,qg=3: P+=(26+1)0,(1-U,), 
p=2,qg=5: P+=}(2b+1)(26+3) UX1-—U,)?. 
When p = 3,qg = 4, the exact test is 
P = (1-U,)°{U2(b + 1) (26+ 3) —U,(26+ 3) + 1} 
+ (1—U,)+2 {02(b + 1) (26+ 1) + U,(26+1)+ }} 
—(1—U,)*+8, 

When P is small, the first term only need be used. 

In the general case p = 3 the exact test is difficult to obtain, but a similar approximation 
gives P=1—Iy (a,b) + (2a SBab (u, -) U1 -U,), 
which reduces to the first term of the test for p = 3, g = 4, whena = 1. 

Finally, for p = 4, q = 5, the approximate test is 

P=}(2b+ 3) (1—U,)° {(b + 2) (26+ 5) Uf —3(2b + 5) U?+6U}}. 

These tests may be used to test the significance of an observed value of 1, or to tabulate 
significance levels of /,. Further, in the rather common situation when A, is obviously not 
zero (1, being highly significant), the tests may be used to test /,, reducing p and q by 1. If 
tables of the incomplete B-function are available the tests are not difficult to apply. 


Roy (1945) has recently found the distribution of the largest canonical correlation, and 
other individual canonical correlations, on the null hypothesis. Roy’s distribution is actually 








* The derivation of the tests in this section is given in the author’s unpublished thesis “The analysis 
and interpretation of multiple measurements’, University of Aberdeen, 1951. 








62 | Tests of significance in canonical analysis 


that of the ‘k-statistics’, which are related to the canonical correlations by equations of 
the form a BP. Fe 

p 1-? 
The problem of the distribution of the largest canonical correlation has thus been solved 
theoretically, but the forms obtained by Roy are not immediately suitable for numerical 
computation. Tests of significance can more easily be carried out (for p = 2,p = 3, and 
p = 4, q = 5) using the formulae of this section. 








5. NUMERICAL RESULTS 


The tabulation of the distribution of 1, would involve tables of quadruple entry (for p, q, n, 
and significance level). The significance levels can only be found by a tedious process of 
trial and error, and it would not be worth while to construct such tables. However, experi- 
ments involving multiple measurements are usually only worth doing when a fairly large 
number of observations can be taken, and the following table of significance levels for large n 
should prove useful. The table gives the 5% and 1 % levels of 4n/? for n large and p = 2, 
q = 2(1) 12,21; p = 3,q = 3(1) 12,21; p= 4,q = 5. 


Significance levels for 4nl? (n large) 


















































p=2 p=3 
q q 
5% 1% 5% 1% 

.. 4 4-30 6-08 3 6-56 8-59 

3 5-37 7-28 4 7-62 9-76 

4 | 6-34 8-36 5 8-59 10-82 

5 | 7-24 9-37 6 9-54 11-84 

6 8-11 10-32 7 10-44 12-82 

ry 8-94 11-23 S 11-31 13-76 

ae 9-75 12-12 9 12-16 14-67 

as 10-53 12-98 10 12-98 15-56 
10 11-30 13-82 11 13-79 16-43 
ll | 12-06 14-64 12 14-59 17-28 
12 12-81 15-44 21 21-30 24-45 
21 | 19-15 22-27 

p=4 
q 
5% 1% 
5 9-81 12-12 
t. 











Significance levels for intermediate values of q may be found by interpolation. The levels 
given are reasonably accurate for n> 100; for lower values of n, rather lower values of 
$ni? are required for significance. 








Is 


EE —————————— 





F. H. C. Marriott 63 


6. AN APPROXIMATE TEST 


Although the tests proposed in §4 are not difficult to apply, an alternative approximate test 
appears desirable for two reasons. First, tables of the incomplete B-function are not always 
available, and secondly, an approximate test which gave good results over the whole range 
of values at present known might reasonably be used outside this range, in default of any 
better test. 

The possible test x?p, = — {n—4(p+q+ 1)} log, (1—/?) has already been mentioned. For 
p=lorg=1, D=p+q-1, and for higher values of p and g, p+q—1<D<pg. It seems 
reasonable to take D = p+q--1+.a{(p—1)(qg—1)}4. The values of « and # which best fit 
the percentage points given by the tests in the previous sections are found to bea = 4,8 = %,* 
so that D = p+q—1+4{(p—1)(q—1)}4. D, of course, is not in general integral, and 
percentage points of x? for fractional degrees of freedom are to be obtained by interpolation 
in the y? table. 

A few percentage points obtained by this test have been calculated for comparison with 
those in the previous section. 


Significance levels for 4nl? (n large) from the approximate test 












































p=2 p=3 
q q 
5% 1% 5% 1% 
2 4-33 6-15 3 6-49 8-62 
6 8-08 10-41 6 9-37 11-85 
12 12-80 15-62 12 14-39 17-33 
21 19-24 22-61 21 21-19 24-70 
p=4 
q 
5% 1% 
5 9-58 12-07 

















The test provides a good approximation over the range considered. There is some 
indication that the test tends to overestimate the significance for higher values of p, but 
with this reservation it seems reasonable to use it outside the range of the tables. 


* An additional term in D of the form a{(p— 1) (q—1)}4 was assumed (after various other possibilities 
had been considered) and the best values of « and £ found by a graphical method. The degrees of freedom 
corresponding to the exact value, d say, were found. Log {d—(p+q—1)} was then plotted against 
log {(p—1)(q—1)}, and @ and # estimated. The values, by a fortunate coincidence, were 0-50 and 0-67 
respectively. A slight improvement could be obtained by taking the additional term to be of the form 
a( p—1)4(q— 1)”, but the small gain in accuracy seems to be more than offset by the loss of symmetry 
and simplicity. 











64 Tests of significance in canonical analysis 


As an example of the way in which the test works for small values of n, consider the 
5% point for p = 2,q = 5. Here 7? = 1—exp{— 14-44/(n — 4)}: 


5% points of |? for p = 2,q =5 








n Exact test Approximate test 
10 0-930 0-910 
20 0-600 0-594 
50 0-270 0-269 
100 0-140 0-140 
co (nz) 14-49 14-44 

















In this case the approximate test gives good results down to n = 20. The value n = 10 is 
included only for comparative purposes; it would, of course, be absurd to do a multivariate 
analysis with such low numbers. 


7. CONCLUSION 


In the present paper the exact distribution of the greatest canonical correlation is given 
for p = 2 and p = 3, g = 4. Further, a significance test, which may be regarded as exact 
for all practical purposes, is given for p = 3 and p = 4, q = 5. 5 and 1 % significance levels 
are given for p = 2 and p = 3 for selected values of g, and for p = 4, g = 5, when nis large. 

An approximate test, related to the x? test for Wilks’s criterion, is proposed and is shown 
to be satisfactory for values of p and q for which an exact test is available. The approximate 
test is x%p = —{n—4(p+q+1)} log, (1-2), where D = p+q—1+H{(p—1)(q—1)}. Out- 
side this range of values there is some reason to suppose the approximate test will give 
a fairly good result. 

When one or more canonical correlations are significant, these tests may be used to test 
higher canonical correlations. For example, if A, ... A,_, are not zero, /, may be tested as if 
it were the greatest canonical correlation with p’ = p—r+1,q’ =q-—r+l. 


REFERENCES 


BaRtTtetTT, M. 8S. (1938). Further aspects of the theory of multiple regression. Proc. Camb. Phil. Soc. 
34, 33. 

Barttett, M. S. (1941). The statistical significance of canonical correlations. Biometrika, 32, 29. 

FisHER, R. A. (1939). The sampling distribution of some stati>ics obtained from non-linear equations. 
Ann. Eugen., Lond., 9, 238. 

Pearson, E. 8. & Wiixs, 8.8. (1933). Methods of statistical analysis appropriate for k samples of two 
variables. Biometrika, 25, 353. 

Rao, C. R. (1948). Tests of significance in multivariate analysis. Biometrika, 35, 58. 

Roy, S. N. (1945). The individual sampling distribution of the maximum, the minimum, and any 
intermediate of the p-statistics on the null hypothesis. Sankhya, 7, 133. 

Wi1xs, 8S. 8. (1932). Certain generalizations in the analysis of variance. Biometrika, 24, 471. 











—_ 


O° 


ee 8s OO f CO ta = 


eo2n8¢8F =. 


iS. 


vO 











[ 65 | 


THE INTERPRETATION OF INTERACTIONS 
IN FACTORIAL EXPERIMENTS 


By E. J. WILLIAMS 


Commonwealth Scientific and Industrial Research Organization, Melbourne 


Where the joint effects of two or more factors are not additive, a simple model is proposed for 
representing the effects. The effects of one factor are assumed to be proportional, rather than equal, at 
different levels of the other factors. 

The main effects of the first factor are given as weighted averages of the simple effects at the 
different levels of the other factors, the weights being the estimated factors of proportionality. The 
weights are given as the latent vector of a matrix of sums of squares and products corresponding to the 
largest latent root of the matrix; the sum of squares for the weighted main effect is a multiple of this 
latent root, and the other latent roots correspond to a partition of the interaction sum of squares. 

The analysis is closely related to the canonical analysis of a set of variates. 

Tests of significance of (a) the residual interactions and (b) the adequacy of a proposed set of weights 
are discussed. 

For the case where the matrix has only two non-vanishing latent roots, the approach of the joint 
distribution of the roots to its limiting form is discussed. The joint probability density is expanded 
as a series of Bessel functions of imaginary argument. Asymptotic formulae for the moments and 
product-moments of the roots are derived. 

Exact tests for the adequacy of a proposed set of weights, when there are only two non-vanishing 
latent roots, are presented. 

The methods of analysis are illustrated with a numerical example. 


I. IntRopvuction 


In a factorial experiment the treatments whose effects are compared comprise all possible 
combinations of two or more factors, each of which is applied at two or more different levels, 
in contrast to experiments in which only one factor is varied, the others being held constant. 
The results of a factorial experiment provide information, not only about the average effects 
of each factor, but also as to the manner in which the effects of one factor vary as other factors 
are changed. The results of such an experiment are most easily interpreted when the effects 
of the different factors are additive; that is, when the effects of one factor are independent 
of variations in other factors. When the effects of different factors are not additive, attention 
must be paid to the way in which they interact. The purpose of this note is to present 
a method which has been found useful in interpreting interaction effects. It is based on 
a simple assumption about the mode of interaction of a pair of factors. 

Where.the effects of two or more factors are not additive, it seems reasonable to assume 
that the effects of one factor (as measured, for example, by the set of differences among the 
results for different levels of the factor), rather than being constant, are proportional at 
different levels of the other factors. When there are two factors, A and B, and the A effects 
are assumed proportional in this way, at different levels of B, the expected effect of the 
combination of level i of A with level j of B may be expressed in the form 


a,c; +b;. 

The ‘main effects’ for the different levels of A would then be represented by the constants 
a,, which are determinate apart from a constant factor; the constants c; may be regarded as 
weights to be applied to the results at different levels of B, in determining the A effects. In 
general, the c; may be either positive or negative. The effects of B would be given by the 
constants b,. It is noted that this representation of the treatment effects is not symmetrical 

Biometrika 39 5 











66 The interpretation of interactions in factorial experiments 


with respect to the factors A and B. This kind of effect often seems to occur; for instance, 
when A represents different types of fertilizer, and B represents different levels of 
application, in an agricultural experiment. 

If this assumption is consistent with the data obtained from an experiment, interaction 
effects found to exist may be satisfactorily accounted for by the methods of analysis set out 
below. R. A. Fisher in 1935 (see Fisher, 1949, §50) applied such a method in the analysis 
of an experiment on quantity and type of nitrogenous fertilizer. The effects of fertilizer 
type are assumed to be proportional to quantity applied, an assumption which is found to 
fit the facts. It is not necessary to the method, however, even where one factor is quantitative, 
to assume the effects of the other factor to be proportional to quantity, though such an 
assumption, if it proves to be realistic, is certainly to be preferred. In many experiments of 
the type described by Fisher, the effects will not be proportional to quantity of fertilizer, 
but will increase less rapidly than quantity. In general, the weights for the effects of 
factor A at different levels of B are to be estimated from the data, being so chosen as to 
maximize the sum of squares for the ‘main effect’ of A, and hence to minimize the residual 
interaction sum of squares. If the interactions thus determined prove to be non-significant, 
the interpretation of the data may be regarded as satisfactory. 

The earliest use of a method of this kind for analysing a factorial experiment appears to 
be due to Fisher & Mackenzie (1923), in the analysis of a varietal and manurial trial with 
potatoes. It was assumed that the expected yield of any variety-manure combination was 
the product of two constants, representing the variety and the manurial level. It was there 
shown that the sums of squares for main effects and interactions are given by the latent 
roots of a matrix of sums of squares and products. The present problem is also closely allied 
to that of comparison of different scales of measurement, as discussed by Cochran (1943). 
In fact, the same basic methods are used, though there are some additional preliminaries, 
such as the determination of the effects of factor B, the testing of the significance of the 
unweighted interactions, and checking the adequacy of weights chosen a priori. 

Testing the significance of interactions involves the same problems as arise in testing 
departures from collinearity of several populations in discriminant function analysis. While 
the distribution theory involved in such tests is far from complete, the available tests appear 
to be quite adequate for practical purposes. The problem of testing significance will be 
discussed briefly, particular attention being given to the simpler case in which the matrix 
has only two latent roots. The method will be illustrated with a practical example. 


II. DETERMINATION OF MAIN EFFECTS TO MINIMIZE INTERACTIONS 


For simplicity we consider an experiment with two factors, A and B, applied at n and 
p levels respectively, and replicated r times. We write 


x,; mean for level i of A in combination with level j of B, 
x; mean for level i of A over all levels of B, 
x, mean for level j of B over all levels of A. 


Without loss of generality we may regard the mean of all results as zero. 
In accordance with the model set out above, the expected value of x,, may be expressed as 


a,c,0+b,, 
where La, = Xb, = 0. (1) 





and 


d as 


(1) 





E. J. WILLIAMS 67 


The introduction of the constant 6, whose significance will appear later, enables a, and C; 
to be chosen to satisfy the further conditions 


La? = Xe? = 1. (2) 


The method of least squares then gives the following equations for the constants; treating 
the a, as one set of constants and minimizing 


XD (%j—4,¢;9 —b;)? + 2aa, 6 + 2BXb, + yUc?, 
vj 


we have X¢,(X;;—4,¢,0—b;)—a = 0, (3) 
: I 
- 9D a,(x,;—a;¢;0 —6;)— ye; = 0, (4) 
Use being made of the relations (1) and (2), these equations recuce to 
a,0+ > b;c, i Li C;Xjz, (6) 
b J 
c;0 = La; Xj, (7) 
t 
b; == @. 5) (8) 
and from (6) and (8), a0 = Xc,(x;;—2,;). (9) 
j 


From (7) and (9) we may eliminate either the a, or the c;; it is preferable, as will be seen, 
to eliminate the set of variates with the greater number of values. We shall assume that 
n exceeds p, and eliminate the a,: 


c,0? = x 2 Cu ise — x) Big (10) 
If big = LX (ig — 5) (Cin — Xx) 
= LX ig Liz — NX, 5X, py (11) 


@* is seen to be a latent root of the matrix 7' whose typical element is ¢;,. This matrix is of 
order p x p, and of rank p; in general, its rank is the lesser of n— 1, p. 
Had the c been eliminated, 0? would have been found as a latent root of the matrix 


U, wh 
rae tne = (tay 5) (Bag) (12) 


This matrix, while having the same non-zero latent roots as 7’, is of order n x n and con- 
sequently has n — p irrelevant zero roots if n exceeds p, and one zero root otherwise. 
The sum of the latent roots of either matrix, multiplied by r, is 


rz ~ (%4,-2;)? = r(d Lxiy—n Ba), 
that is, the sum of squares for the main effect of A and for the interactions of A and B. If 
172 = 8, 


then the largest value s, of s is the maximum value which the weighted sum of squares 
between treatments can take. Accordingly, the .corresponding weights are those which 
minimize the residual sum of squares (8, + 83+ ...). This weighting may thus be regarded as 


5-2 











68 The interpretation of interactions in factorial experiments 


giving the ‘main effect’ of factor A, and s, as the sum of squares for this weighted main 
effect. The other values of s, namely, 89, 83, ..., are seen to be orthogonal components of the 
sum of squares for residual interaction effects. 

If only the value s, is significant, and the values 8, 8,,... are non-significant, then the 
interactions have been successfully accounted for. However, even where s, alone does not 
account for all the significant effects, the fact that s, is the maximum component of the 
residual interactions, s, the maximum of the remainder after eliminating the effect corre- 
sponding to 8., and so on, enables the effects corresponding to the largest components of 
interaction to be identified, and separately tested for significance. 

Corresponding to any latent root, the values of a; and c; may be determined. Since, in 
practice, only the values corresponding to the largest root are required, the simplest method 
of calculation is the iterative method. This may be based on the use of equation (10) or 
a similar equation for the a,;. Alternatively, equations (7) and (9) may be used alternately 
to derive successive approximations for the a; and c;. Equation (7) may be written as 

c,0 = La,(x,;— 2. ;), 
so that the calculations can be performed directly on a table of the values x;;—2_; (or the 
corresponding totals). This method obviates the calculation of the matrix 7’, but the 
convergence of the iteration is only about half as rapid as that using equation (10). Either 


of these methods simultaneously yields successive approximations to the largest latent 
root, as well as the constants. 


III. DETERMINATION OF INTERACTION COMPONENTS WHEN UNWEIGHTED 
MAIN EFFECTS ARE REQUIRED, AND IN OTHER CASES 


Sometimes the unweighted main effects for each factor are relevant, though such a situation 
does not seem often to arise. In such a case the interactions, defined in the ordinary way, 
may be separated into components representing effects of decreasing magnitude. This 
enables an interpretation of any significant interactions to be made. As the actual pro- 
cedures, both algebraic and computational, do not differ in principle from those used in the 
case discussed above, only the results need be presented. 

The expected value of x,; is 


cdjO +a;+;, 
where La, = Xb; = Xe; = Xd; = 0, (1’) 
and Xe? = Xd} = 1. (2’) 
The equations for the c; and d; are 
c,O = Ydj(ais— 2.5), (13) 
4,0 = 2 Ci(%ig— %,). (14) 


Values of 6? are found as latent roots of the matrix V, where 
pe = DX (Xig — Xj, — Xj) (Lye — Xj, — Xx) 
t 


= L (ig — %;,) (Lig — Xj.) — NX, 5g. (15) 
v ’ 
The quantities s,, 8,83, ... are the canonical components of the interaction sum of squares, 
whose significance may be separately tested. 








y 








E. J. WILL1aMs 69 


Another alternative, of more practical importance, is to weight both main effects. This 
method derives from the assumption that the expected effect (either absolute, or as a 
deviation from the general mean) is the product of constants representing the effects of the 
two factors. Such an assumption was made by Fisher & Mackenzie (1923). 


The sums of squares for main effects and interactions are then given by the latent roots 
of the matrix W, where 
v 


Since these different cases are all treated in the same way, and the first described seems 
most important in practice, these others will not be considered further. 


IV. EXTENSION TO DATA WITH PARTIALLY CONFOUNDED INTERACTIONS 


The method of analysis outlined above applies where the experiment has been so arranged 
that all treatment comparisons are of equal accuracy (treatment effects being either 
unconfounded, or all confounded to the same extent). Assuming no interaction effects, we 
find the matrix of expectations of the elements ¢,, in this case to be a multiple of J, the unit 
matrix. The analysis leads to the solutions of the matrix equation 


|T-@I| =0. 


In cases where the (unweighted) interactions have been partially confounded, the method 
of analysis has to be somewhat modified. For simplicity, we consider only the case where 
all such interactions have been confounded to the same extent. We suppose that the inter- 
action effects are estimated with an efficiency factor H; that is, the variance of interaction 
comparisons is 1/Z times that of main effects. For a n x n factorial experiment, using blocks 
of n experimental units, where it is possible to confound all the interaction effects equally 
in a multiple of n—1 replications, without confounding any of the main effects, we have 

n—2 
E = n—-1° 

In the analysis, use is made of the estimates of the effects of one factor at each level of the 
other (which, following Yates, we shall call simple effects). These estimates reduce to the 
mean 2;; only when there is no confounding; their calculation in other cases is standard 
procedure (see, for example, Yates, 1937), and will not be discussed here. In determining 
the weighting which maximizes the sum of squares for weighted main effects, we find that 
quantities ¢,, are involved, defined as above, except that they are, in the general case, 
functions of the simple effects and not of the treatment means. The matrix of expectations 
of the ¢;,, is now proportional to J, where 








l-—e —e —e —e] 
—e l-e —e —e 
EJ = —-e -—-e l—-e —e 
i|-e -e —e l—e 
1-Ez° 
and e=——-.. 











70 The interpretation of interactions in factorial experiments 


From this result it follows that, corresponding to any set of weights w,;, the sum of squares 
for weighted main effects is 


YD Wj Wet jx, 
jk 
provided z {Zw} —e(Zw,)?} = 1. 


We therefore choose the scale of the weights to satisfy this condition. 

Moreover, it can be shown that two sets of weights, w, and w}, correspond to independent 
comparisons if and only if , , 
and that consequently the correlation between the comparisons corresponding to the two 
sets of weights will be 1 
E {Zw,w — e(Lw;) (Zw})}. 


The solution of the normal equations leads to the matrix equation 
|7-@J | = 0; 
and the sums of squares for the weighted main effects and for the canonical components of 
interaction are given by the values of r@?. 


A typical example is the 3x 3 factorial, laid out in blocks of three units. With two 
replications, it is possible to confound all the interaction comparisons by half. Then EZ = 3, 


and we have % 
5 -1] -1l 
J=}]-1 5 —1}. 
-1l -1l 5 


Such a case provides the basis for the numerical example. 


V. ONE FACTOR AT TWO LEVELS ONLY 


When factor A is at two levels only, the matrix 7 has only one latent root. The weights for 
any level of factor B are proportional to the observed differences between two levels of A. 
The weighted main effect accounts for all the effects of factor A, and interactions are 
annihilated. The simplicity of this case arises from the fact that the equations of estimation 
of all the constants are linear; consequently, the distribution theory and significance tests 
required are those occurring in the analysis of variance. 

In particular, it is a simple matter to test whether the weights observed differ significantly 
from any set of weights chosen a priori, and thus, provided the weights are sufficiently 
accurately determined, to derive fiducial limits at any given level of probability for the true 
relative magnitudes of the A effects at different levels of B. 

Suppose that the assigned weights are w,, scaled so that 


Lu} = 1. 
Then if Yj; = %3—Xoqy, 
the sum of squares for the weighted A effect is 
$r(Zw;y;)?. 


The remainder, or interaction, is  47{Zy}—(Zw;y;)*}, 








E. J. WILLIAMS 71 


a sum of squares with p — 1 degrees of freedom, which may be tested for significance against 
the error mean square. Sets of values of w; which render the remainder significant may then 
be rejected as discordant with the data. No such rejection is possible unless 4rZy}, treated 
as a sum of squares with p— 1 degrees of freedom, is significant. 


VI. TEsts OF SIGNIFICANCE 


The tests of significance relevant to the results of the analysis of interaction effects will 
consist of tests of the adequacy of the assumptions made in carrying out the analysis. The 
basic assumption is that, with main effects suitably defined, interactions do not exist. This 
is equivalent to assuming that, if ¢ = {9,3} 


is the matrix of expected effects of factor A, then ¢ is of rank 1; that is, the matrix 


® = r¢'¢ 
has only one non-zero latent root. We shall denote this latent root by A. 

In this case, the weights c; would be proportional to the set of elements of any row of the 
effect matrix ¢. If these weights were known, the significance of the interactions (or, what 
is equivalent, the existence of other non-zero roots of ®) cr uld be tested by means of an 
analysis of variance. The sum of squares for the weighted . iain effect would then be pro- 
portional to a non-central xy”, with parameter A, and the residual interactions would be 
distributed independently of A. 

When the weights are not known but estimated from the data, the result of sampling 
errors in the weights is that all the sample latent roots are correlated, and have distributions 
depending on the population latent root. Some allowance must then be made for the effect 
of A on the sampie roots representing interactions. It will be shown that, provided the 
population latent root is sufficiently large, a fairly accurate test for other roots can be made, 
based on the observed interactions. Such a test can be seen to be equivalent to a ‘test for 
collinearity ’, as described by Fisher (1938). 

Another aspect of the data which should be tested is the adequacy of any system of 
weights chosen a priori, or, in general, fiducial limits for the true weights. Here again, 
a fairly accurate test is possible, provided the population root is large. Fisher (1940) has 
discussed such tests as applied to discriminant functions. 

Since the method of analysis described is of no interest unless ettects of treatment factor A 
are known to exist, a test of the main effect is clearly not relevant. However, the fact that 
such effects exist is likely to ensure that the population latent root is large, so that 
significance tests are more satisfactory than they might otherwise be. 

We shall assume for convenience that the general variance in the population is unity. 
If the population root A is large, the distribution of s, approximates a non-central x? 
distribution, with n+p—2 degrees of freedom and parameter A. Any proposed set of 
weights will lead to a sum of squares, s,, for the weighted main effect of factor A which, by 
hypothesis, will have the same parameter A, and n— 1 degrees of freedom. Hence, in this 
case, the adequacy of any proposed set of weights can be tested by the difference s, —s,,, 
a central x? with p— 1 degrees of freedom. 

In the same way, since the sum of squares for A effects and for interactions is a non- 
central y? with (n—1)p degrees of freedom, the sum of squares for interactions may be 
regarded as having a x? distribution with (n—2)(g— 1) degrees of freedom, and may be 
tested for significance accordingly. 








72 The interpretation of interactions in factorial experiments 


VII. THEORETICAL BASIS OF THE TESTS APPLIED WHEN A IS LARGE, AND DISCUSSION 
OF THE APPROACH OF THE DISTRIBUTION TO ITS LIMITING FORM 


The case of A large can be seen to give the distributions stated above if we consider the 
population ‘main effect’. The correlation between the observed main effect and the 
population main effect will be near to unity, when A is large, and since the interaction 
components are orthogonal to the observed main effect, their correlation with the population 
effect will be nearly zero. In deriving the observed main effect, p— 1 independent constants 
are fitted, so that the effect has in all n + p—2 degrees of freedom. The interaction sum of 
squares will then have (n — 2) (p— 1) degrees of freedom, and be virtually independent of A. 
The tests given then follow. (This a:gument can of course be extended to the case where two 
or more population roots are large, in deriving the limiting distribution of the sum of the 
remaining sample roots.) 

It is instructive to consider the approach of the joint distribution to the limiting form, and 
this will now be done for the case where there are only two sample roots which are not 
identically zero. It is hoped to generalize the results given here in later communications. 
For this case, let p = 2. The joint probability density of s, and s,, when A = 0, is (see, for 
example, Hsu, 1939) 

1 
4(n — 3)! 





e-H81+82) (3, 8,)H"—4) (3, — 8,) ds, dq. (16) 


The population root introduces a factor (see Bartlett, 1947) 


et 5 (3A)! m, 


9S > 
iso th fy 


1 é DG-k+PTK+H) ge 


1 
ai—*ss, 


(17) 


where me FF My Ti —k+ I P(k+1) 





7 2TR@—) +4) 
oe aan 


Clearly, when A = 0, E(m;) = ft. 





The limiting distribution is (Hsu, 1940) 





co) i 
e-HA+8; +82) ghin—2) ghin—4) (4A) ds, ds, 


1 
2"-1T [h(n — 2)] “ill (an +i) 





1 g,\ Hn—2) /g,\ Hn—4) ds, ds 
coe portant. oe —MA+8,+8,) [1 2 1 / — 
8, and s, now being independent. 
‘The joint distribution function can conveniently be expressed as an asymptotic series, 
with leading term equal to the limiting distribution function, the series consisting of terms 


involving Bessel functions of increasing order. In deriving the series, it is to be remembered 
that, while 


8 = O(1). 








eS 





7) 


8) 


8, 
ns 
ed 








— — 





E. J. WimLiaMs 73 


Put As, = uh, Asy = uf; 
then the joint density is 











(19) 


uf + uz 
exp| -4(A+ > )| (du, )H"-2+26 D(4) /m, 
= ui” uy -3 1 a 
2Ke—OT4(n — 2) A (1- > » NTR 1) +4] (Z) 
m 12 ri—k+4) ney) 
sho ol (i—k+ 1) Tk +1) \u, 
We now sum the series in (19), with respect toi for given values of k. In illustration of the 
method, we give the summation for k = 0. The sum is 


and 





(n—2)4+27 4(n—2)+27 
: (3) nwew _¢ (3) ri+HTGnsi) 
Hot! Pi¥(m—1) +e) Ti+1) Soe! (4n42) [4+ 1) Ti Mm—-1) +74] 


- (z ae 


= Sarqasa i »$(n—2); $n+7; 1) 











Hn—2)+27 
_3() (1+ fem ()?1.3.(m—2)n 
~ tT Gn +i) I (4n+i)  2'(4n4+i)(4ntit+]1) 


1.(n—2) 1.3.(n—2)n 
= Tyn_a(U) + 2a, T,,(U) + Qu)? Dying 2)(Uy) + «++ (20) 


On summing the series for all values of k, and collecting terms of the same order, we have 
for the summed terms in (19), 





3(n—2)n nuz 
Tyn—2) + + a tint = ua i +} if a a = — Jn+a t+ a8 Lynt+ aa ie + 
s(n _ * n(n + 2) 3n(n +2) ug 3(n +2) us 
pis coe the Ss 9 
+ 1603 Fyinsa) Tr i + l6us Tyn+ vet aa8 A 2) + i+ ete., (21) 


where the argument of the Bessel functions is omitted for convenience. The numerical 
coefficients in each set of terms are seen to be the same as those occurring in the m,, namely, 


1 P(a—k+4)T(k+4) 
TPG—k+1)T(k+1) 

The series (21), substituted for the sum in (19), gives the required representation of the 
joint probability density with terms of decreasing order of magnitude in A. From this 
representation, the joint moments of s, and s, may be determined as asymptotic series in A. 

In the calculation of moments, and for other purposes, it can be assumed that the ranges 
of integration of s, and s, (and hence also of u, and w,) are independent. The errors thus 
introduced decrease exponentially with A, and consequently do not enter into the asymptotic 
formulae. 

The general term in the expansion of the probability density is of the form 


k exp| - a(r + s ai *) | wf tg I(u,), (22) 


and similar terms will appear in the integrand in the determination of any product-moment. 

















74 The interpretation of interactions in factorial experiments 


The integral of such an expression (see, for example, Watson, 1944, p. 393) is 








Ke. QHP+a--2 Ko tat) Hist poe F,(4(t—p) +1; t+1; — 4A), (23) 
where ,F; is a generalized hypergeometric function (Watson, 1944, p. 100). 
The function P(e; f; —4A) (e<f) 
has the asymptotic expansion 
2 a od (24) 
I(f—e)\AT(f—e) Att(f—e—1)  2!Aet+?P(f—e-—2) 


as may be seen by considering the differential equation which it satisfies. Actually it is 
easier to consider the function 


et F,(e; f; — 4A) = A(f-4; fi 4), 
which leads to equivalent results. 
The product-moments may now be derived. 
(gn +g) Ui4(n—2) +h) 
F,(-—g; 4n; —4A)+: 
Tae) Tae ayy lO os H+: 
1 (* - *) T'(4n+g) U[4(n— 2) +h] 
2\n )~ Tan) Ti(n—2)] 
1T[}(n—2)+g) T(gnt+h) 2, _ a ) = 
~9 T'(4n) P'[4(n—2)] F;( g+1; $n; 4A) +: ’ ( ) 
with similar terms which may be derived in an obvious way. On replacing the ,F, by their 


asymptotic expansions, we find, after some reduction, and replacing the u by their values 
in terms of the s, 


B(sfst) = Ae(n—2), (1-4 





E(ufpup) = (22)0+*( 





Fi(—g+1; $(m+2); — 4A) 








297+9(n—2)—h 
A 


+ ja| 20+ 2atin—4) +2 (nt 10n + 20)-2(n—2) (n— 6) — 2g%h — gh(n— 6) — 2 _™ 


+ x [$9° + 295(m — 6) + 493(3n? — 42n + 124) + 493(n3 — 30n? + 224n — 432) 
— $97(3n3 — 54n? + 288n — 416) + 49(n3 — 15n? + 74n — 96) — 29th — 2g°h(n — 8) 
— $9°h(n? — 18n + 82) + $gh(n?— 15n+70)—h(n + 3) —g*h? — $gh?(n — 10) —$h?—$h3]+.. | . 
(26) 
where (n—2), = (n—2)n(n+ 2)... (n+ 2h—4). 

The expressions for the moments are somewhat complicated, and it is advisable to make 
use of the numerous checks which exist. Any symmetric polynomial in s, and s, has as 
expectation a polynomial in A, of degree equal to the highest power (say f) of s, occurring. 
Consequently, in the asymptotic expansion of the expectation, all terms beyond the 


(f+ 1)th vanish. These results provide the required checks, which may be enumerated as 
follows: 


Terms in expansion Check polynomials 
2 1 
3 8, + 8g, 8183 
4 si + a. 838, +8, 88, 9183 





) 


ie 


—— 





E. J. WILLIAMS 75 
Explicitly, we have 
E(8,+8,) = A+2(n—1), 
E(6,8,) = (n—2)[A+(n—1)], 
E(s? + 83) = A? + 2A(n + 2) + 2(n— 1) (n+ 2), 
E(s}s.+ 8,83) = (n— 2) [A?+ 3A(n + 1) +2 (n—1)(n+1)], 
E(s?s3) = (n— 2) n[A? + 2A(n+ 1) + (n—1) (n+ 1)]. 


(a) Test for significance of interaction 


Since s, is not distributed independently of A, it would be desirable to find some function 
of s, and s,, distributed independently of A, which would provide a fully satisfactory test 
for the significance of interactions. Such a function does not appear to exist. From the form 
of the limiting distribution for large A it can be seen that the function would be equivalent 
to 8, at this limit. Below is given a function whose mean is independent of A for A large, and 
tends to s,, but its other moments are. functions of A. It therefore appears that only an 
approximate test is ever possible. 

The moments of s, are 





h h®+3h h®+9h? + 2h(n+3) 
A 22 218 sgt 


Bist) = (n—2),{1-5- 


By comparing moments, it can be seen that 
1 n+3 
ae ae 


n(n — 2) 
r2 





has approximately a y? distribution with n—2+ degrees of freedom. 
In practice, A being unknown, we may use 
4, 4 
Sts nt 2) 8?” 
whose mean is n—2+O(A-), 


4(n—2) | 2(n—2)(n+1)(n+4) 
r na 


The means, variances and covariance of s, and 8, are also of interest. We have 
ae 8 
Bis,) = A+n+(n—2(Q4+ 545 ), 
E(8,) = (n-2)(1-3 id ’ 


4(n—2) 2(n—2)(n+7)_ 4(m—2)(7n+22) _ 
r 


and variance (n—2)n— 





+0(A°). 


V(s,) = 4A+2n 








72 73 andy 
V(s) = a(n—2)-<"— 2) _ Ben Pym 8) _ Aen eee. 





cov (8,, 8.) = (n— 2) 4 we i YN td ce 


A As 











76 The interpretation of interactions in factorial experiments 


The correlation between s, and s, is accordingly 


(2952) 00 


and can clearly be ignored if A is large, and m small; indeed, since A must be O(n*) to be 
significant, as is shown below, a significant value of A implies a correlation of at most 


O(A-*) 





for large n. 


(b) Large values of n 


Another instructive limiting case is that of large n. While not of practical importance for 
the type of data discussed here, it does give some indication of the way in which s, and s, 
depend on A, while proving more tractable than the general case. 


If we put 8, = n+y, V(2n), 
82 = N+ Ye (2n), 
A=  mv(2n), 


the limiting distribution is found to be 


5 qexPt- B[YT + Y2— (Yr + Yo) + 0°T} (Yr — Ye) Lol d0(Y1 — Ye)] yr dye. 


The derivation shows, incidentally, that, to be shown as significant, A must be of order nt. 
The interesting feature of this distribution is that, as can readily be seen by a change of 
variables, y, + y_ and y,— Y, are distributed independently, though both are dependent on 
the parameter 7. When we put 

4 =YtYo %=Yi-Ye 
the joint distribution is 





" z 
e-'e,—-0)* dz, fe tein) 1o(*2") 2» Az,. 


2m 2 


The sum z, has a normal distribution with mean 9 and variance 2; z, has a distribution 
which, when 7 is zero, becomes that of (2x?) for two degrees of freedom, and which, for 
large 7, may be put in the asymptotic form 


l eo 
rat (3) ae 


The mean value of z, can best be found from the series 


Pee ee 2 
Tg Bat” 2a 
and the variance from 2-5-5 -4- 


It is seen that, when 7 is large, z, and z, are distributed with approximately the same mean 
and variance, so that y, becomes irrelevant for estimating 7. 








ae om 





| }. nn?) on Ee SE eS 


eo & s&s © 


oO 


t! 


<< 


for 
l 85 


nt, 
> of 
/on 


sion 
for 


lean 











E. J. WILL1aMs 77 


VIII. TEst FOR ADEQUACY OF ASSIGNED WEIGHTS 


Corresponding to any set of weights w, for the effects of factor A at the jth level of factor B, 
there will be a sum of squares s,, for the weighted main effect of A. Clearly, any criterion 
for testing the concordance of these weights with the experimental results will involve the 
difference s,—8,,. A criterion of this kind can, of course, only be applied after it has been 
shown that interactions have been accounted for by the weighting applied; or, in other 
words, that there is no evidence for the existence of more than one population latent root, 
as indicated by the sample roots 89, 5, .... 

When p = 2 and there are only two roots s, and 8., any value of s,, must lie between them. 
The simultaneous distribution of these three quantities may be found from the following 
considerations. A test of the weights w, is actually a test of the hypothesis that the non-zero 
population root A corresponds to just these weights. Under these conditions, s,, is seen to 
be a sufficient statistic for A. Its distribution is, in fact, that of a non-central y? with n— 1 
degrees of freedom and parameter A. The simultaneous conditional distribution of s, and 
8, for fixed s,,, is therefore independent of A, depending only on the given value of s,,,. 

The simultaneous distribution of s,, 8, and s,, is found to be 


1 
] e-Ksit+82) (s, 8.) K"-4 (3, — 8.) ds, ds, 


in)! 
* — ds,, (: * Bs (As,,)2 ‘ 
7 V[(81— 8) (8, — 8)] adel 


n—1 2!(n—1)(n+1) 
so that the conditional distribution of s, and s, is 














1 eH, +85-24) ($1 8,)#"—» (8; — 89) 
2K") Jn P[4(m— 2)] sk) (81-8) (8 — 82) 


We now make a change of variables, putting 


ds, ds». 


(8, Ps 81) (8, pei 8g) 


Sw 


y= 





_ 8182 
2 —_— . 
Sy 


The simultaneous probability density of v, and v, is found to be 


1 


e-ir vstdv, x 2-2 TF (n — 2)] 





1 
V(2m) é Hayhin 4) dvg, 
so that v, is distributed as x? with 1 degree of freedom, while v, is independently distributed 
as x? with n— 2 degrees of freedom. 

While v, and v, formally are symmetric functions of s, and 8,, it can be seen that, from the 
circumstance that s,, will, for acceptable hypotheses, lie in the neighbourhood of s,, v, will 
depend mainly on s, and approximate to s,—s,, while v, will depend mainly on s,. Thus, 
Vv, provides a test of the residual interaction effects, while v, provides an independent test of 
the adequacy of the chosen set of weights. 

It will be noted that the test provided by v,, while exact, is somewhat ‘fluid’, in that it 
depends on the choice of s,,. In practice this does not cause any difficulty; any set of weights 
which makes 1, or v, significant may be regarded as unacceptable. (An adjustment can be 
made to the significance levels used for each test, to ensure that chance rejection due to 











78 The interpretation of interactions in factorial experiments 


either cause is of the selected frequency. If the significance probability level is a, we may 
take probability level. a, and a, for the two tests, such that a,+a,—a,a, = a, since the 
tests are independent.) 

The lower bound for v, is 8, corresponding to the choice of s,, = s,. The systematic use 
of 8,, regarded as having a x? distribution with n— 2 degrees of freedom, results in under- 
estimation of significance, as hag already been indicated above (§ VII (a)). 

The criterion v, may be used to determine fiducial limits for s,,, and hence for the corre- 
sponding weights; however, it will be noted that v, is not a monotonic function of ,,, as it 
has a maximum value 

(v8: — v8)", 


when 8, = V(8,82). 


Hence, for sufficiently high probability levels no possible value of s,, will be found discordant 
with the data. This is as would be expected. 

If we have an independent estimate of error, a mean square M based on q degrees of 
freedom, and F is the tabular value of the F-distribution for 1 and g degrees of freedom, and 
if we consider only departures of main effects and not of interactions then the fiducial limit 
for s,, is given by the equation 





v, = (8,— wh (Ow #9) = FM, 


whence 8, = ${8,+5,—FM + y[(s, —8_)*— 2F-M(s,+8,)+F?2M?}}, 


the larger root being the relevant one. 

When n = 3, there are likewise two roots s, and s,, but since sets of weights can be chosen 
giving a sum of squares less than s,, the foregoing analysis does not apply. There are then, 
in fact, two canonical comparisons, corresponding to s, and s,, together with a set of p—2 
comparisons corresponding to zero roots. We can, however, derive a valid test for assigned 
weights and for interactions by considering the regression of the w, on the c;. 

Let ,¢;, 2¢; be the sets of weights corresponding to s,, 8, respectively, and put 


r= 2105p r= X 265 W;- 
¥ 





, oe 
and if we put 8, = rey 
we have 8, >8,> 3. 
It can be shown that v= (81 - =f) (8, — 89) 
Tr 
is distributed as y* with 1 degree of freedom, while 
818, 


is distributed as x? with p— 1 degrees of freedom. Also 
U3 => 8, = 8y 
is likewise distributed with p—2 degrees of freedom. The test of the chosen weights is 
therefore given by v, + v3, with p — 1 degrees of freedom, while v, gives the test for interactions. 
These tests will be discussed more fully in later communications, where it is also intended 
to give generalizations to the case of three or more sample latent roots. 











ay 
he 


ise 
er- 


re- 
3 it 


ant 


of 
ind 
mit 


sen 
en, 
—2 
ned 


ts is 
ons. 
ded 








E. J. WILLIAMS 79 


IX. NuMERICAL EXAMPLE 


The data for the following example are provided by an experiment on the effects of time 
of contact of catalyst (A), and concentration of catalyst (B), on the dry tensile strength of 
a plastic material. Each factor is at three levels, and there are two replications of each 
treatment combination. The replications were carried out over six days, in such a way that 
all the interaction effects were one-half confounded with differences between days, while 
main effects remained unconfounded. 


Table 1. Experimental results (with level of factor A in brackets) 

















Level of B re 1 2 3 Total 
Replication Day 

1 1 11-61 (1) 11-94 (2) 12-21 (3) 35-76 

2 11-78 (2) 11-98 (3) 11-94 (1) 35-70 

3 11-75. (3) 12-72 (1) | 10-82 (2) 35-29 

2 4 12-89 (1) 12-38 (3) 11-94 (2) 37-21 

5 11-99 (3) 12-17 (2) 11-80 (1) 35-96 

6 11-57 (2) 12-39 (1) 12-11 (3) 36-07 

Total 71-59 73-58 70-82 215-99 























The experimental results are set out in Table 1, and the analysis of variance, performed 
without weighting, in Table 2. In the latter, sums of squares and mean squares have been 
multiplied by 18 to avoid rounding-off errors, and this factor is retained throughout the 
calculations. The analysis shows that interactions of the two factors are significant. Further 
interpretation is based on the assumption that the effects of time of contact (A) are 
proportional at different levels of concentration (B). 

The estimates of the simple effects, multiplied by 18, are presented in Table 3. To obtain 
the value for the combination of the first level of each factor, for instance, the results of 
Table 1 are multiplied by the following set of factors: 


8 -4 -—4 
—l 2 -1 
-1l -l 2 

8 -4 -4 
-1 2 -1l 
-1 -1l 2 


and similarly for the other simple effects. 

From Table 3 the values of 324t,, are readily derived as sums of squares and products of 
the deviations of the values in different rows from their means. The quantities 36t;, are set 
out in Table 4, together with the matrix J. 

The sum of squares for each canonical comparison is given by twice the corresponding 
root of the equation | 7-0 | =0. 


Consequently, each root ¢ of | 367'—¢J | = 0 
will correspond to a sum of squares multiplied by 4° = 18, which is the factor already being 








80 The interpretation of interactions in factorial experiments 


used in our analyses of variance. The equation for ¢ is a cubic, which may be written 
explicitly as 463 — 136-7512¢? + 228-87889056¢ = 0. 7 
The roots of this equation, and the corresponding sets of weights, are given in Table 5. From 


























































































































Table 2. Analysis of variance (a), main Table 3. Estimates of simple effects 
effects unweighted and main effects ( x 18) 
Degrees | Sum of | Mean Level of A | 
of squares | square Level Mean 
freedom | (x18) | (x 18) of B 
1 2 3 
Blocks 5 12-7337 | 2-5467 
Main effects: A 2 12-1706 | 6-0853* 1 0-34 | 15-01} — 3-17 4-06 j 
B 2 15-5018 | 7-7509* 2 — 3-77 4-24 | —16-46 | —5-33 j 
Interaction, AB 4 22-0172 | 5-5043* 3 —0-23 | —5-00 9-04 1-27 i 
Error 4 2-0832 | 0-5208 Mean — 1-22 4-15 | — 3°53 0-00 
Total 17 64-5065 
od 
* Significant at the 5 % level. 
Table 4. Sums of squares and products Table 5. Latent vectors and latent roots 
( x 36) of estimates of A effects 
at different levels of B Latent vectors Latent 
Level of roots 
The matrix J B(j) from 
20-6682 19-9398 — 13-2504 5 <4 9 1 2 3 | Table 4 
19-9398 24-2106 —16-5360 4);-1 5-1 
— 13-2504 — 16-5360 11-3262 -l1-1 5 
1¢) 0-55872| 0-59747 | — 0-11208 | 32-4230 
¢) 0-59255 | —0-23243| 0-45275| 1-7648 
x¢;  |—0-05818| 0-50565| 0-67016| 0-0000 | 
, r , P ' ] 
Table 6. Analysis of variance (b) weighted Table 7. Analysis of variance (c) test ( 
main effects of factor A of assigned weights 
] 
' 
Degrees | Sum of Mean Degrees | Sum of Mean j 
of squares | square of squares | square i 
freedom | (x 18) ( x 18) freedom | (x 18) ( x 18) ) . 
Main effects: A 4 32-4230 | 8-1058* Assigned weights 2 31-7844 | 15-8922 I 
Interaction: AB 2 1-7648 | 0-8824(n) Departures 2 0-6183 | 00-3092 (n) 
from assigned I 
weights j 
Sum: A+AB 6 34-1878 Interactions 2 1-7851 | 0-8925(n) | j 
* Significant at the 5 % level. Sum: A+ AB 6 34-1878 
(n) Not significant. 




















(n) Not significant. 




















n) 


n) 








E. J. WILLIAMS 81 


the weighted analysis of variance, Table 6, we see that the weighting satisfactorily accounts 
for the interaction effects. 

In practice it will be convenient to adopt a set of weights which is simpler than that 
given by the largest value of 6?, and it will then be desirable to test the concordance of these 
weights with the data. For the present example, the set 

WwW, = Ww, = V3, wz=0 
suggests itself. 
With the notation of § VIII, and using the results of §IV, we have 


188,, = 3(20-6682 + 2 x 19-9398 + 24-2106) 


= 31-7844. 
ry = 28d + C2— 3(1C1 + 12+ 1¢s)] 
= 0-98977; 
similarly, re = 0-10921, 
ry = 009176. 
Note, as a check, that ri+n+r = 1. 
18s,,, 
It follows that 18s, nN 
= 32-0452. 
Then 18v, = 18 Crs er— Sa) 
r 
= 0-3484, 


18v, = 1822 
r 


= 1-7851. 


The analysis of variance for testing the adequacy of the proposed weights is set out in 
Table 7. The usual significance tests show that the weights are satisfactory. 


REFERENCES 


BarTLetT, M. S. (1947). Multivariate analysis. Suppl. J.R. Statist. Soc. 9, 176. 

Cocuran, W. G. (1943). The comparison of different scales of measurement for experimental results. 
Ann. Math. Statist. 14, 205. 

FisHer, R. A. (1938). The statistical utilization of multiple measurements. Ann. Eugen., Lond., 
8, 376. 

FisHer, R. A. (1940). The precision of discriminant functions. Ann. Eugen., Lond., 10, 422. 

FIsuHER, R. A. (1949). The Design of Experiments. Edinburgh: Oliver and Boyd. 

FisHer, R. A. & Mackenzixz, W. A. (1923). Studies in crop variation. II. The manurial response of 
different potato varieties. J. Agric. Sci. 13, 311. 

Hsu, P. L.-(1939). On the distribution of roots of certain determinantal equations. Ann. Eugen., 
Lond., 9, 250. 

Hsv, P. L. (1940). On generalized analysis of variance. I. Biometrika, 31, 221. 

Watson, G. N. (1944). A Treatise on the Theory of Bessel Functions. Cambridge University Press. 

Yates, F. (1937). The design and analysis of factorial experiments. Tech. Commun. Bur. Soil Sci., 
Harpenden, no. 35. 


Biometrika 39 6 














[ 82 ] 


ON SAMPLING FROM A POPULATION OF RANKERS 


By A. S. C. EHRENBERG 
Statistical Laboratory, University of Cambridge 


1. It is clear from the recent symposium on ranking methods (Moran, Whitfield & Daniels, 
1950) and Kendall’s book (1948) that relatively little is known of the theory of sampling 
from a population of ‘rankers’ or finite rankings—the problem of m rankings—particularly 
under anything but what may be called the ‘null-hypothesis’ of random ranking. A few 
remarks, even if rather incomplete, on measures of agreement among the rankings may 
therefore be of interest. 


2. Consider m rankings of a set of n objects. Of a number of rather similar measures 
of agreement, the popular one nowadays is the coefficient of concordance W (Kendall 
& Babington Smith, 1939; Kendall, 1948), which is defined as the sum of squares of the 
n totals of the m rank values for each object about the mean }m(n+ 1), divided by the 
maximum of this expression, m?(n3 —)/12, which occurs if all m rankings agree. Here each 
ranking is denoted by the numbers 1 to n. The null-hypothesis that the ranks are given at 
random can be tested by using the fact that under it (m—1)W/(1— W) is distributed 
approximately as the variance-ratio F-distribution with v, = n— 1—2/mand vz = (m—1)?, 
degrees of freedom. Exact tables of the distribution when m and n are both small are given 
by Kendall (1948), who also discusses the effects of ‘ties’. A related test for randomness, 
proposed by Friedman (1937), is that m(n— 1) W is distributed as x? with (n—1) degrees 
of freedom, if m(n—1)>7. Stuart (1951) discusses the distribution of W more generally. 

The coefficient of concordance W is a generalization of Spearman’s well-known rank 
correlation coefficient p, which is defined in terms of the squared differences between the 
two rankings (represented again by the numbers | to n) of each object, i.e. 





n 
p=l1 > (ist ranking — 2nd ranking of ith object). 
=1 


n3 —n ; 


In fact (Kendall, 1948) if p,, stands for the mean of the coefficients p between the ,,C, pairs 


of rankings, then 
. Fs mW —1 


m—1 ~ 





Pav. 


Now, whatever is felt in general about the relative merits as measures of rank correlation 
of Spearman’s and other coefficients such as Kendall’s 7 (see below), in this case the Spearman 
approach does not seem so appropriate. In it, the ranking of any two objects can be too 
much affected by what has happened to the other objects, which matters particularly if 
the objects to be ranked are arbitrarily selected. For example, if six out of seven rankers 
(m = 7) rank a certain object, i, highest, and all seven rank another object, j, second, then 
i is ranked higher than j significantly more often at about the 5 % probability level. Yet if 
the seventh person gives object i some rank bigger than 8, then j would be ranked higher 
than i according to the ‘p— W’ kind of reckoning; what may make this more absurd is that 
it could happen only if there are actually more than eight objects to be ranked. 





irs 


jon 
1an 
too 
y if 
cers 
hen 
t if 
her 
hat 





A. 8. C. EHRENBERG 83 


3. Some coefficient based on the number r;; of the m rankers who agree on ordering object 
i higher than j (i,j = 1, ...,n) seems more attractive than W. It can easily be seen that about 
the simplest function of the r;; to handle is r,;,(m—r;;).An attractive property of this, 
compared with any possible linear function such as max. (r;;,m—71;;), is that strong agree- 
ment about any one pair of objects is weighted heavily. Constructing a coefficient u which 
takes the value 1 for complete agreement and whose minimum value is roughly zero, one 
obtains 

“= ary 5 4 ism — Tes :), 

where M = ,,C,, N = ,,C,. This is, in fact, the average of the ,,C, ‘Kendall’ rank correlation 
coefficients 7 between the m rankings taken in pairs, where 7 is defined by the same formula, 
putting m = 2. The precise minimum value of u is —1/(m—1) for m even and —1/m for 
m odd, and its expected value for random rankings is 0. Kendall (1948) claims for W that 
it ranges from 0 to 1, and a similar coefficient might be preferred here, i.e. one related to u in 
the same way that W is related to p,,. However, W can actually only take the value of 0 if 
m(n + t) is even, and the coefficients u and p,y, seem slightly more attractive measures since 
for each there are two points whose meaning is unaltered by changes in m and n. It is 
important to note that wu is simple to calculate. 


4. It is, perhaps, not surprising that the coefficient u turns out to be identical with the 
coefficient of agreement u for paired comparisons defined by Kendall & Babington Smith 
(1940; Kendall, 1948), who, however, considered it as ‘not of much practical value’ for 
ranked data. In the method of paired comparisons, n objects are ranked in pairs by each 
of the m observers, there being N = ,,C, pairs in all. It is clear that a person may make 
comparisons inconsistent with a one-dimensional ranking of the n objects, and it is one of 
the attractions of the method that this can be tested (e.g. Kendall, 1948). 

For the null-hypothesis that the comparisons have been made at random, it has been 
shown (e.g. Kendall, 1948) that the distribution of wu can be approximated to by a x? form 
as follows: oN 
vt = in (*t 1)—(m—3) pv, 
where v = 2NM/(m-— 2)? is the appropriate number of degrees of freedom. As the number 
of objects n increases, the distribution tends to normality with variance 1/MN. 

For the non-null case, suppose that the probability of ranking the ith object higher than 
the jth is 7;;, estimated by r;;/m. (It is unnecessary to specify any relations between 
the 7’s.)- The expected value of wu is then given by 


4” 


Further, considering the first few moments,* it is easy to show that the distribution of u 
tends to normality for either large m or n, with variance 





i 2 
var. (u) = une, {(m— 1) [77,;(1 — 7 ,;)— 4274 3?( 1 — 27,;)?] + 2073,(1 —77;,;)7}, 
except that for n fixed the distribution is a y? under the null-hypothesis of all 7’s = 3}, 
as already mentioned. This compares with the distribution of the multiple correlation 


* Mr B. Babington Smith has kindly drawn my attention to similar formulae described by him in 
the discussion of the paper by Professor A. 8. C. Ross (1950: J.R. Statist. Soc. B, 12, 54). 


6-2 











84 On sampling from a population of rankers 


coefficient R when the population value is or is not equal to zero (Fisher, 1928). It may be 
noted that var. (w) in general can be larger than the value 1/MN in the null-case. 

5. As is to be expected, the distribution of w from ranked data, where in each ranking 
the comparisons of pairs of objects cannot be independent, is not so simple. But certain 
results come easily enough. First one should notice that it follows at once from the well- 
known result for the correlation of p and 7 under the hypothesis of random ranking (e.g. 
Kendall, 1948), that the correlation of u and W is 


2(n +1) 
/(4n? + 10m)’ 
which tends to 1 very rapidly as n increases. (Clearly all coefficients which are at all reasonable 
should agree in this case.) But for non-random rankings the values of the two coefficients 
can differ; for example, for ‘circular’ rankings (m = n), e.g. 
123 
231 
312, 
W = Oand u = 1—4/m-— 1 as m increases (cf. §2). 
Kendall & Babington Smith (1940) give, as ‘a matter of theoretical interest’, a y? 
approximation for the sampling distribution of u on the hypothesis of random ranking. 


Since the result there contained an error, no doubt merely a misprint, it seems worth while 
to derive it anew. The first few moments of wu are 








E(u) = 0, 
(2n + 5) 

2) — 
E(u?) = 9MN ’ 

. 2(m — 2) (2n? + 6n+7 
Bu’) = = rote , 
Elut 2MN(2n+5)? (6n3+21n?+31n+31) (m—2)(m—3) (2n3 + 8n? + 12n+9) 
adie ala 225 " 27 MNS 


where again M = ,,C,, N = ,,C,. These expressions are obtained by generalizing Daniels’s 
(1944) method for deriving the sampling variance of 7; it should be noted that the account 
of this in Kendall’s book (1948) seems to contain a number of false steps, besides some 
confusion (cf. Moran, 1951) due to taking over the notation of Daniels’s original paper. 

The Pearsonian coefficients £, and f, are of the order 8/n and (3+ 12/n) respectively, 
indicating a type III distribution, which turns out to be 

6(2n +5) MN 
S = 
"= (m—2)(2n?+6n+7)"°” 

_-2(2n+5)8 MN 
~ (m— 2)? (2n? + 6n + 7)? 





with v 





degrees of freedom. 


For large n the distribution of uw tends to normality with variance (2n + 5)/(9MN). 
To test the goodness of fit of the xy? approximation the exact distribution of wu has been 
calculated for some small values of m and n; the labour of this increases very rapidly with 





SE ger 








a 


— ee 








A. 8. C. EHRENBERG 85 


mand n. For three objects and four or five rankers, and for four objects and three rankers, 
we have frequencies f as follows: 


n=3,m=4 18u 18, 12, 10, 6, 4, 2, 0, —2, —4, -6; 

f 1, 8, 6, 20, 24, 6, 28, 60, 48, 15; 
n=3,m =5 30u 30, 22, 18, 14, 10, 6, 2, —2, -—6; 

f 1, 10, 30, 30, 100, 95, 240, 420, 370; 
n=4,m=3 18u 18, 14, 10, 6, 2, —2, —6; 

f 1, 9, 33, 82, 135, 165, 151. 


Applying a continuity correction of taking the mean of the calculated u and the next 
smallest value (for large m, n, subtract 1/MN for m even, 2/MWN for m odd), one obtains the 
following probabilities for exceeding certain values in percentages: 


n=3,m= 4 n=3,m=5 n=4,m=3 
Exact probabilities 6:94; 4-17 4-71, 2-39, 0-85 21-70, 7-47, 1-73 
x?-probabilities (approx.) 6-3, 2°8 4:5, 2:3, 0-9 20-5, 5-0, 2-0 


The agreement is clearly good enough, considering the very small values of m and n. 


6. In the non-random case, if there are n objects, there are n! ways of ranking them, 
which seems to require n! parameters—generally far too large a number to handle, as 
Kendall, for example, pointed out at the recent symposium (Moran et al. 1950). 

But if n = 3, there are only five independent parameters. Thus, consider a population in 
which the probabilities of ranking are: 


Ranking 123 321 231 213 312 132 
Probability p, PP, Ps Ps Ps Po 
6 
where > p; = 1. Then the moments of the sample value t of the coefficient 7 between rankings 
i=1 


drawn from two such populations (the second being distinguished by dashes) are given by 
, , ’ 2 ’ , ’ , ‘ ’ 
E(t™*1) = 419, +9292 + 9393 — 321 (4192 +9192 + 9293 + 9293 + 9391 + 9391); 


’ , 2 , , , , , , 
E(t) = ryrytrergtt3rgt 3 (Ty T2471 ot 1273 +1273 +1371 +1371); 
where 1=Pi-Pe I=Ps—-Pr» I3=Ps—Po 
and 1 =PitPo T2=P3tPy 3 = Pot Pe- 


Clearly, the distribution of an average of a number of such correlations would rapidly tend 
to normality with a variance that could easily be calculated. Exactly the same result 
holds, of course, for Spearman’s p, if one substitutes powers of 4 for } in the expressions for 
the moments. Whilst in principle it is possible to obtain similar results for n > 3,no reasonably 
simple way of writing the expressions is apparent; in the case of four objects nearly fifty 
parameters would have to be estimated and for n = 8, say, almost one hundred thousand 
areinvolved. Again, for more than two rankings (m> 2), the distribution of a coefficient 
such as u does not seem amenable to simple treatment even when the rankings are all drawn 
from the same population and there are three objects only. 











86 On sampling from a population of rankers 


Considerable simplification in correlating two rankings is possible if one of them is given 
a priori, i.e. is fixed. If the fixed ranking is put in the ‘natural’ order 1, 2, 3, etc., the results 
of the last paragraph, for example, would reduce to 


E(e) = q,-25% 








3241 ’ 
Tot? (1—7}) 
E(?) = +743 = 1+ V. 


Generally, with one ranking fixed, 7 does not distinguish between certain different cases in 
the other ranking, i.e. all those for which the minimum number of interchanges of pairs of 
objects which transform them to the given ranking (cf. Kendall, 1948) are equal. For three 
objects, for example, both rankings 132 and 213 need at least one interchange, the rankings 
231 and 312 need two, and the ranking 321 needs three interchanges to transform them to 
the order 123. All this results in a radical reduction in the number of parameters required, 
from n! to N = ,C,independent parameters. Thus, in correlating with a fixed ranking, let P, 
be the probability of sampling a ranking requiring a minimum of s interchanges to reduce 
it to the given ranking (s = 0, 1, 2,..., NV). Then the moments of ¢ are 


N 2s\! 
E(#) = 1— Fr... 
(t) = 7 ; 


which even for n = 8 objects, say, has less than thirty terms, each very simple to estimate 
if sufficient data are available, as in the problem of m rankings. Here the average of the 
m coefficients 7 with the fixed ranking would rapidly tend to normality, with variance 
2(2n+5)/9mn(n—1) on the null-hypothesis of random rankings, and otherwise with 
variance obtainable from the uncorrected moments just given, the probabilities P, being 
estimated from the m rankings. 


7. One other kind of coefficient may be mentioned in conclusion. Instead of forming 
m 


faa >» t/M, 
a<b 


where ¢,,, is the rank correlation between rankings a and b, one could consider a coefficient 
which is insensitive to complete reversals of order in any one ranking, such as 
m 
u'= > %,/M. 
a<b 


The mean and variance under the null hypothesis are easily seen to be given by 





rn _ _2(2n+5) 
mares 9mn(n—1)’ 
~1)(2n4+5)2 (m2—5 24 9) n2 
OMN® var. (u’) = M2 W2n+ om im+10) 6m jar ties a 
» 


Apart from testing for the rather peculiar kind of non-random asymmetry implied 
(cf. example in Kendall, 1948, p. 89), no particular applications of this coefficient are at 
present apparent. It is mentioned because it seems a rather simpler form of a suggestion 
made recently by Moran (Moran et al. 1950, p. 157). 


This note has been written in connexion with work carried out on behalf of the Depart- 
ment of Scientific and Industrial Research (Food Investigation Organization). 








re | 








A. 8S. C. EHRENBERG 87 


REFERENCES 


_Dantets, H. E. (1944). The relation between measures of correlation in the universe of sample 


permutations. Biometrika, 33, 129. 

Fisuer, R. A. (1928). The general sampling distribution of the multiple correlation coefficient. Proc. 
Roy. Soc. A, 121, 654. 

FRIEDMAN, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis 
of variance. J. Amer. Statist. Ass. 32, 675. 

KENDALL, M. G. (1948). Rank Correlation Methods. London: Griffin and Co. 

KENDALL, M. G. & BaBINGTON Situ, B. (1959). The problem of m rankings. Ann. Math. Statist. 
10, 275. 

KENDALL, M. G. & BABINGTON Situ, B. (1940). On the method of paired comparisons. Biometrika, 
31, 324. 

Moray, P. A. P. (1951). Rank Correlation Methods (Review). J.R. Statist. Soc. A, 114, 99. 

Moran, P. A. P., WHITFIELD, J. W. & Dantets, H. E. (1950). Symposium on ranking methods. 
JR. Statist. Soc. B, 12, 153. 

Stuart, A. (1951). An application of the distribution of the ranking concordance coefficient. 
Biometrika, 38, 33. 











[ 88 ] 


LEAST-SQUARES ESTIMATION OF LOCATION AND SCALE 
PARAMETERS USING ORDER STATISTICS 


By E. H. LLOYD, Imperial College, London 


1. INTRODUCTION 


In this paper we are concerned with distributions which depend on location and scale 
parameters only. For such distributions it will be shown that the parameters may be 
estimated by applying general least-squares theory to an ordered sample, the resulting 
estimates being unbiased, linear in the ordered observations, and of minimal variance. 
Explicit formulae are obtained for the estimates, and for their variances and covariance. 

The special case of symmetrical distributions is discussed in some detail, and it is shown 
that for these the estimates are uncorrelated. As examples, the rectangular and the normal 
distributions are discussed; in the normal case the ‘ordered’ estimate of the mean turns 
out to be the sample mean, whereas in the rectangular case the ordered estimates of mean 
and range are functions of the extreme observations only, with sampling variances of 
order n~*. 

The ‘ordered’ estimate of the population mean has a sampling variance which never 
exceeds that of the sample mean; it is equal to that of the sample mean if and only if the 
row-totals of the variance matrix of the ordered observations all have the same sum. The 
‘ordered’ estimate then becomes the sample mean itself. When the variance matrix 
does not satisfy this condition the ‘ordered’ estimate of the population mean has strictly 
smaller sampling variance than has the sample mean. 


2. EXISTENCE OF LEAST-SQUARES ESTIMATES 


We consider the estimation of the location and scale parameters , o (not necessarily the 
mean and standard deviation) of a variate X whose distribution depends on only these two 
parameters. Let (X,, X.,...,X,) be a sample of n independent observations on X. Arrange 
the X; in ascending order of magnitude and denote the ordered set by (¥;, Y, ..., Y,,), so that 
Rts... sF.. (2-1) 


n 


‘We shall consider unbiased estimates which are linear functions of these ordered observa- 
tions, and we first prove that the parameters may both be estimated by functions of this 
type which have minimal variance. To do this we introduce the standardized variates 


U, Ti (X, —p)/o, 


which may be regarded as independent observations on the standardized variate 
U = (X —y)/o, whose distribution is parameter-free. We arrange the U, in ascending order 
of magnitude, denoting the ordered set by (V;, K, ..., V,,). Then 


% = (%,-plo 


and Asks... sV,. 








Pe — ae 


= 
oj 
ey 
©] 
} 


te 
er 








E. H. Luoyp 89 
Let é(V,) = a,, (2-2) 


var (V,) =,,, cov (V,,¥,) = @,.; 


these quantities have known values depending on the form of the parent distribution but 
not on the parameters yw and o. 


Reverting now to the original ordered observations we clearly have 
é(Y,) =f+0u,, var (Y,) <a OW, cov (Y,, Y,) = O7W,,. (2-3) 


Since the ordered observations have expectations which are linear functions of the 
parameters “ and o, with known coefficients, and variances and covariances which are 
known up to a scale factor o?, the least-squares theorem of Gauss and Markoff applies (see, 
for example, Aitken, 1935). The parameters are therefore estimable by unbiased linear 
functions of the Y,, of minimal variance. 


[Godwin (1949a) has recently proved a similar theorem for the estimation of dispersion 
alone. ] 
3. THE LEAST-SQUARES ESTIMATES OF #4 AND & 
We write the equations (2-3) in matrix form, as follows: 
6(Y) = wl+oa, 


where Y is the vector of the Y,, a the vector of the a,, and 1 a vector with unit elements. 
This equation may be written more compactly as 


&(¥) = pe, (3-1) 
where p is the (x x 2) matrix (1,a), and 6’ = (u,o). The variance matrix of the Y,, i.e. the 
matrix of variances and covariances, is 
V(Y) = ow, (3-2) 
where w is the (x x n) symmetric positive-definite matrix of the ,,. 
The required estimator of the vector 6 of parameters is given by 


8 = (p’Qp) p’QY, (3-3) 
where Q = w~!. The variance matrix of the estimates is (p’Qp)-! a”, where 
. 1’Q1, 1'Qa 
p Qp ~ la ae ’ (3 4) 


the elements of this matrix being, of course, scalars. The inverse of this matrix is 
1/ @Qa, —-1'Qa 
’ = Pai ? 
(p'2py*= a(_ 1’Qa, in 
where A is the determinant of the matrix p’Qp. 
Using these results in (3-3) we find for the estimates 


p=-aTY, ¢=I1TY, 
where I is the skew-symmetric matrix defined by 
T = Q(1a’—al’) Q/A. 
The variances and the covariance of these estimates are given by 


var (fi) = a’Qac®/A, var(é) = 1'Qlo*/A, cov (fi, é) = —1'Qac?/A. (3-6) 














90 Least-squares estimation of location and scale parameters 


4. SYMMETRY PROPERTIES OF ORDERED OBSERVATIONS OBTAINED 
FROM A SYMMETRIC PARENT 


From now on we restrict our attention to symmetric distributions. For these we shall take 
the location parameter y to be the population mean. 
The ordered values V, of observations on the standardized variate U satisfy the inequalities 


Ushs...<V% 
and therefore also —-V,<—-V,418...8—-K. 


Hence the set (—V,,..., -V,) may be regarded as ordered observations on the variate — U. 
Now, since U is symmetrically distributed about a zero mean, its distribution coincides 
with that of — U. Thus the joint distribution of the ordered observations: 


(V,, \, ..., V,,) is the same as that of the set (—V,, —V,_,,..., -K); (4:1) 


in particular, the two sets have the same means and the same variance matrix. 
We can express this succinctly in matrix form by introducing the (n x n) permutation 


matrix J, Oo 1 
5. ( E ) 
1 O 


It may be noted that J is symmetric and orthogonal, and its row totals are all unity: 
J=J'=J", J1=1. : (4:2) 


When it is used as a pre-multiplier, J has the effect of reversing the order of the rows of the 
matrix on which it operates. Thus 


am Vee V, 
Be, Vina | v, 
7 V, U, 


The assertion (4:1) thus means that the vector variates V and —JV have the same 
distribution. In particular é(V) =a =6(—JV), 
V(V)=w=V(-JV). 
It follows that a=-Ja, w= JwJ. 


Inverting both members of the last equation we obtain 
Q = JQJ. 


5. ESTIMATION OF }#t AND O FOR A SYMMETRIC DISTRIBUTION 


To proceed with the actual estimation we first prove that in the case of a symmetric 
parent the matrix p’Qp of (3-4) is diagonal. Its off-diagonal elements are each proportional 
to 1’Qa, and, on applying the symmetry properties developed above, we have 


1’'Qa = 1'(JQJ) (— Ja) 
= —l'JQJ*a = — 1'Qa, 








< 


’) 


ie 


ic 
ul 





E. H. Liuoyp 91 


since 1’J = 1’ and J? = I, the unit matrix. The element 1’Qa is therefore equal to its own 

negative, and so is zero. The matrix p’Qp is thus diagonal, as is its inverse. Since this 

inverse is proportional to the variance matrix of our estimates it follows that, in the case 

of a symmetric parent, f and @ are uncorrelated. ; 
The inverse of p’'Qp now takes the simple form 


diag (1/1’Q1, 1/a’Qa), 
VQY Q,,Y, 














and the estimates become p= 121 ~ 50,’ 
ij 
; (5-1) 
~ @QY 2O,;0,Y; 
7 aQa LO ;,% 50; 
o o 
Their variances are var (fi) = ai ~ 50.” 
ij 
(5°2) 
are o 
var (o) = 





and their covariance is zero. In these formulae the ‘=’ denotes summation with respect 
to both suffices. 

It may be noted that the expected values of unordered observations are independent of 
the scale parameter. If we use unordered observations, therefore, the method of least 
squares is incapable of providing an optimum linear estimate of the scale parameter; it is 
certainly possible to find an unbiased quadratic estimate of the population variance, but 
this will not necessarily have any optimum properties. 


6. AN EXAMPLE: THE RECTANGULAR DISTRIBUTION 


A simple illustration which is, nevertheless, of considerable interest is provided by the 
well-known case of the rectangular distribution. We take the parameters yw and o to be the 
mean and the standard deviation of the population. 

The expectation vector and the variance matrix of the ordered observations can be 
evaluated explicitly, and we easily find, for a sample of n, 


a, = (/3)(2r—n—1)/(n+1), 


and @,, = 12r(n—8+1)/(n+1)?(n+2) (rs). 
The inverse Q of w may be shown by induction to be given by 
2 -1 0 DO) ix 0 0 
“ -1 2 -1 oe as 0 0 
a aa , 
acess o~4 3 <1. “Se 
0 0 0 0 0 -1l 2 
2 
We have 119 = ENO F 1, 0,0,...,0,1), 
and fie CLOTS, 1-0 6 ....09. 


4 3 








92 Least-squares estimation of location and scale parameters 








whence p = 4(¥,+Y,), 
a 1 (n+1) 
The sampling variances of these estimates are 
6c2 202 
var (fi) = ad var (¢) 2 


(n+1)(m+2)’ ~ (n—1)(n+2)’ 
the covariance is, of course, zero. 

Our estimates thus have sampling variances of order n~?, which is in striking contrast to 
the results obtained from moment estimates calculated from the unordered observations. 
For example, if we estimate /, by means of the sample mean m,. and a by means of the sample 
standard deviation s, where aed ‘ 

ns* = X(X;—m)?, 
we have sampling variances of order n~! only: 
var (m) = o?/n, 
var (8) = o?/5n+O(n-?). 


These variances are larger than those of the ‘ordered’ estimates by a whole order of 
magnitude. 

Unfortunately, one cannot compare the sampling variances of the ‘ordered’ estimates 
with the values given by the Cramer-Rao inequality, since this does not apply to distribu- 
tions (such as the rectangular) which have discontinuity points at positions determined by 
the parameters. 

It might be noted that attention was drawn to the low asymptotic value 6/n of var (jf) by 
R. A. Fisher, as long ago as 1921 (Fisher, 1921). A few years later Neyman and Pearson 
(1928) gave a detailed discussion of the sample centre and range. 

To conclude the discussion of this example, we might perhaps think it more natural in 
the case of the rectangular distribution to estimate the extremities A, and A, rather than the 
mean and the standard deviation. These may be directly estimated by a similar procedure 
to the above, or, alternatively, we may use the fact that their least-squares estimates 
A, and A, are connected with & and & by the same linear relationships which connect the 
corresponding parameters. Since 


A, =p-oJ3, A,=p+e/3, 


the estimates are 4,=%-(¥,-Y)/(n—-1), 

A, = ¥,+(¥,-K)/(n—1), 
with var (A,) = var (A,) = n(A,—A,)?/(n?— 1) (n+ 2), 
and cov (Aj, Ag) = — (Ap —A,)?/(n? — 1) (n + 2). 


The corresponding unbiased estimate of the range is 
A,— Ay = (Yu -Ki) (n+ 1)/(n— 1), 
with sampling variance 2(A,—A,)?/(n + 1) (n+ 2). 


Finally, we remark that \, and , are jointly sufficient estimators of A, and A,. 











rr 


ao ef 


bat al 


to 
ns. 
ple 














E. H. Lioyp 93. 


7. CONDITIONS UNDER WHICH p HAS A SMALLER VARIANCE THAN THE SAMPLE MEAN 


Since the sum XY; of the ordered observations is the same as the sum 1X; of the unordered 
observations, the sample mean may be defined equally as &X;,/n and as XY,/n. It is therefore 
an unbiased linear compound of the ordered observations. Our estimate fi has, by con- 
struction, minimal variance in the class of such estimates, so that its variance is at most 
equal to that of the sample mean. Hence 


var (fi) S$ o?/n. (7-1) 


It is of interest to investigate the conditions under which this relation is a strict inequality. 
That such a situation can in fact exist is demonstrated by the example of the rectangular 
distribution. 

We first note that the variance matrix M of any non-degenerate distribution is 
symmetric and positive-definite. The symmetry property is evident; if Z,,Z,,...,Z, are 


the variates involved, My = cov (Zj, Z;) = cov (Zz Z;) = Myy 
To demonstrate the positive-definiteness consider the variate 
AZ, +AgZgt+... +A,Z. 


The variance o? of this expression is a quadratic form in the A;, with matrix M. This 
quadratic form o?, being a variance, is necessarily positive for all non-zero A. Hence M is 
positive-definite. 

The variance matrix w of our ordered observations V is thus symmetric and positive 
definite. It may therefore be written in the form 


w = tt’, (7-2) 
where t is a lower triangular matrix, i.e. t;; = 0 for i<j. (This is the Choleski resolution. 
See, for example, Fox, Huskey & Wilkinson (1948).) 


Consider now the sum XU; of the unordered standardized observations; its variance is 
clearly equal to n. Noting that SU, = SY, 


=I1’'V 
in matrix notation, we see that 
n = var(1'V) = 1'wl 


= I|’tt'l = h’h 
= Zh?, (7-3) 
say, where h=t'l. 
Similarly, we find 1’Q1 = 1'(t-!)’t-1'1 = k’k 
= Dk, (7-4) 
say, where: k =t"'1. 
Further, Lh,k; = h’k = 1'tt1= 1'1 
= *. (7:5) 


Now by Schwartz’s inequality we have 


Thy S (Sh e,)2/Ehi, (7-6) 














94 Least-squares estimation of location and scale parameters 
whence, using (7-3) and (7-5), Lk zn, 
or VQi2n. 
But since var (f) = o?/1’Q1, this inequality reduces to 
var (ft) $ o%/n, 


which is, of course, (7-1) again. We are now, however, in a position to state conditions under 
which this relation becomes an equation. By the well-known properties of Schwartz’s 
inequality, the equality sign in (7-6) applies if and only if the k; are proportional to the h,, say 


k; = bh;, 


for some constant 6. When this is so, it follows from (7-4) and (7-5) that 6 = 1. 
Thus (7-6) is an equation if and only if 


k = h, 
or t’1 =t"'I, 
or, since tt’ = w, wl = 1. (7-7) 


The interpretation of this condition is that, in each row (and, by symmetry, each column) 
of the matrix w, the sum of the elements is unity. 

When the condition holds, not only is the variance of f equal to that of the sample mean, 
the estimate f@ in fact becomes the sample mean. For (7-7) implies that 


so that p= = 








When the condition is not satisfied, 
var (fi) < o?/n. 


Thus, when the row totals of the variance matrix of the ordered observations are all equal, 
the ‘ordered’ estimate f coincides with the sample mean; when, however, the row totals 
ofthe variance matrix are not all equal, f is a better estimate that the sample mean 
(‘better’ in the sense of having smaller sampling variance). 

An example of the latter case has already been discussed. An example of the former case 
is provided by the normal distribution. 


8. THE NORMAL DISTRIBUTION 


It is known from general theory that no estimator of the mean of the normal distribution 
can be better (in the above sense) than the sample mean. The ordered estimate must therefore 
coincide with the sample mean for this distribution. 

(This, incidentally, proves that, for the standardized normal distribution, every row of 
the variance matrix of the ordered observations adds up to unity.) 





ay 


*7) 


in) 


an, 


al, 
als 
an 


ase 


ion 
ore 


of 





E. H. Luoyp 95 


The ‘ordered’ estimate of the standard deviation, however, cannot be dealt with so 
easily; the expected values, variances and covariances of the ordered observations have to 
be computed before the formulae (5-1, 5-2) can be used. Godwin (19496) has tabulated the 
required figures for sample sizes up to n = 10, and, in another paper (1949a), has compared 
the efficiencies of the ‘ordered’ estimate with various other estimates of o, the efficiency 
of the ordered estimate, for n < 10, being at least 98-83 %. 


I am grateful to Prof. M. G. Kendall for suggesting a number of improvements to the 
original draft of this paper. 


REFERENCES 
AITKEN, A. C. (1935). On least squares and linear combination of observations. Proc. Roy. Soc. Edinb. 
55, 42. 
FisHEer, R. A. (1921). On the mathematical foundations of theoretical statistics. Philos. Trans. A, 
222, 309. 


Fox, L., Huskey, H. D. & Witkinson, J. H. (1948). Notes on the solution of algebraic linear 
simultaneous equations. Quart. J. Mech. Appl. Math. 1, 149 (esp. pp. 159 ff.). 

Gopwin, H. J. (1949a). On the estimation of dispersion by linear systematic statistics. Biometrika, 
36, 92. 

Gopwin, H. J. (19496). Some low moments of order statistics. Ann. Math. Statist. 20, 279. 

Nryman, J. & Pearson, E. S. (1928). On the use and interpretation of certain test criteria. 
Biometrika, 20a, 175. 











[ 96 ] 


REGRESSION, STRUCTURE AND FUNCTIONAL 
RELATIONSHIP. PART II 


By M. G. KENDALL 
Division of Research Techniques, London School of Economics 


1. In the first part of this paper I reviewed the ‘heory of regression and the determination 
of functional relationship when both variables are subject to error. In this continuation 
I examine some of the problems arising in the structural analysis, but before doing so add 
a few further comments on the problem of determining functional relationship. 


FURTHER REMARKS ON FUNCTIONAL RELATIONSHIP 


2. In 1940 Wald propounded a new solution of the problem of fitting a straight line to 


data when both variables are subject to error. He obtained estimators and confidence 
intervals for the parameters % and a, in the equation 


Y = a%+a,X, (1) 


where, as in Part I, the variables Y and X are subject to errors of observation ¢ and 7. Wald 
made the usual assumptions about the independence of ¢ and 9 and of successive values of 
each. The essence of the method is to divide the observations (assumed even in number, 
say, 2m) into two groups of m and to construct as an estimator of a, the statistic 

m 2m 

ZH- = Y; 

j=1 j=m+1 

a, = 





m 2m : (2) 
xX %- = x; 
j=1 j=m+1 
x and y being the observed values of X and Y. He also requires a new type of assumption; 
the limit inferior of 
fice 2 ; ‘ 
= 2 Be) — 2% Be) (>) (3) 
is positive. Under these conditions the estimator (2) is consistent as m increases. It appears 
at first sight as if Wald’s solution contradicts the statement of §42 (Part 1) to the effect that 
no solution is possible without some assumption concerning the relative magnitudes of the 
errors. But condition (3) is really equivalent to some assumption of this kind. We have, in 
the general case, no reason to suppose that it is satisfied, for z,...2,, can be any of the 
2m values of x observed. If we take them to be the smallest values they are not independent 
of the errors of observation; and if we take them as the values with the smallest expectations 
we require to know those expectations, or at least to know that the expectations obey (3). 
To decide this point by inspection of the data is not always possible; and Wald himself 
notes (pp. 297-8) a remark by Hotelling that if ¢, 7 are normal and the values X are 
themselves chosen randomly from a normal population, the expression (3) converges in 
probability to zero for any set of x’s which is defined independently of the error.* 
* Geary (1942) observes in a footnote on p. 69 that Wald’s method is applicable when the ‘ variables’ 


are distributed on the nermal surface of error, but I think he has overlooked this remark; and, indeed, 
the implications of the condition (3) are very far from obvious. 





~ 











M. G. KENDALL 97 


3. Nevertheless, Wald’s paper is an important contribution. He showed that whereas 
in earlier treatments a knowledge of the parent parameters vare and var 7 is required—or 
at least a knowledge of their ratio—a much more genera] assumption of the type associated 
with expression (3) will suffice to provide consistent estimators in cases where we can 
objectively split the values X into two groups obeying it. From a somewhat similar point 
Theil (1950) has recently developed a theory of distribution-free regression. Speaking 


somewhat loosely, one might say that the problem is resolved if the errors in the z-variable 
m 2m 

are not large enough to disturb the orders of the variates }z; and > z;,, i.e. that the first 

i 1 


m+ 
m 


is or is not greater than the second according as the expectation } H(z;) is or is not greater 


2m 1 
than > E(x;). In a more refined model Theil requires the assumption that 
m+1 


E(x), E(x), slate E (2m) 


are in the same order (or the same order inverted) as 2, ...22,,, Which is always so if the 
observational errors are less than the intervals between the expectations. 

The problem of fitting functional relations is thus soluble under certain assumptions 
concerning the order of the values of the underlying variables. Such assumptions are often 
plausible or can be made so by spacing out the observations; but they have to be made 
nevertheless. 


4. Berkson (1950) has recently claimed to be able to derive an unbiased estimator of 
the constants in a regression relationship where the variates are both subject to error. 
It would appear as if his results (pp. 170-3) are equivalent to equations (42) and (46) 
of Part I of this paper. But it seems to me that he applies them in cases where I should 
regard their uses as unwarranted, namely, in the estimation of constants in linear relation- 
ships. In the notation of §40, Part I, suppose we have two observables, x’, y’, given by 


ry =X+vx, \ 


(4) 
Yy = Y+uy.| 

If we estimate a, in equation (1) by least squares applied to x’ and y’ we find 
a, = Ly’x' /Xx". (5) 


Berkson now regards 2’ as ‘ preassigned’ or, as I should say, fixed, and argues in this way 
(p. 173): we have, from (4), 


y =Y+w 
=A%+a,X+w 
= yt a,2’-—a,v+w. 
Hence E(a,) = El{Xagx’ + a, La’? — a, La'v + Lw2’}/Zx'?}. 
For fixed 2’ ExXz' =0 (by convention), 
EXz'v=0 (for all X), (6) 


EXz'w=0 (forall X,Y), 
and hence, from the expression for E(a,), 


E(a,) = a. (7) 


Biometrika 39 7 











98 Regression, structure and functional relationship. Part II 


5. Apparently Berkson regards this result as valid in estimating a, in an experiment 
where the 2’ are assigned values. By assigning values to a random variate he seers to mean 
that if we fix a’ we do not fix X, but commit an error vx in attempting to do so. Thus, in 
an experiment on dosage mortality, if we weigh out what purports to be 2g. of a drug, we 
may commit an error; but if we nevertheless carry out an estimation by equation (5) as 
if no error were committed, we shall still get an unbiased estimation, the true value (actual 
weight X) being unknown and the error vy being unknown, but their sum ex hypothesi being 
2y., the observed 2’. All that is required is that equations (6) shall be satisfied. 

But this requirement seems to me to be incapable of being satisfied. If we fix xy and still 
require vy to be a variate then X must be a variate, which is impossible. If zy is not fixed 


— B(a'xvx) = E(Xvx +0), 


which cannot vanish if H(vy) = 0. 


6. Berkson, as it seems to me, inadvertently confuses the issue by entitling his paper 
‘ Are there two regressions?’ (Frisch does the same by speaking of a ‘true regression’ instead 
of a functional relationship.) The answer, in my terminology, is that there are certainly 
two regressions. If we are estimating constants in a linear relationship (which is nothing to 
(lo with regression analysis in the classical sense) and use a least-squares technique, then 
we cannot obtain an unbiased estimator, when there is experimental error in the predicated 
variable, unless we introduce auxiliary assumptions. The uniqueness of the linear relation- 
ship is not in question.* 


7. Perhaps it may remove yet another source of confusion if I refer at this point to some 
work by Geary (1942, 1943). Geary is concerned with inherent relations between variates. 
If we have a number of variates x subject to errors (of observation, say) ¢, and if the ¢’s are 
independent of the z’s and of each other then the product-cumulants of the set 2’( = 2+) 


are the same as those of the z’s. If the z’s are related by equations such as 
k 

gy = a constant, (8) 
J = 


then it is possible to write down linear equations in the «’s and the product-cumulants, 
which can be calculated from the observed x’. This gives estimates of the a’s in terms of 
observable quantities without making any demands on the nature of the error except the 
usual conditions of independence. Unfortunately, the method breaks down, as Geary 
observes, when the z’s are jointly normal because the cumulants all vanish for order greater 
than 2, or if they are independent, for then the product-cumulants all vanish. It is to be 
noted also that if x’ is normal and «, € are independent, then each must also be normal. The 
method applies, then, only where the errors or the 2’s are not normally distributed. It is true 
that most distributions are not exactly normal, but if they are close to normality a very 
large sample would be required to detect any deviation. But, in any case, Geary’s method 
applies only to cases of relationships between variates. Where functional relations are 
concerned there are no cumulants of the unobservable variables. 
Geary’s results, in fact, relate to structural equations, which I now consider. 


STRUCTURAL RELATIONS 
8. In studying the joint variation of a complex of variables or variates we often wish to 
consider the case when some of them are connected by functional relationships, particularly 


* See the note at the end of this paper. 





h to 
arly 


M. G. KENDALL 99 


of the linear type (since these are simpler to handle). We may have functional relations 
between variables of the kind represented in equation (1), or functional relations between 
variates such as 


y = Bot Ay. (9) 
These relations are usually suggested by some prior theoretical ana}y‘dis. 1 have noted that 
we cannot have mixed equations such as SF. 
y = Pot+ PX, - (10) 
although we may have, for example, + 
Yx = Pot Pi X+Ex, (11) 


which means that the variate y, distributed.in a form which is dependent on some 
variable X, is linearly related to another vari:te ¢ which may or may not depend on X. 


9. In writing down these relations we may follow two quite different courses. The point 
may be illustrated by a simple economic example in which we postulate that quantity sold 
(Q) is linearly related to price (P) by-a relation 


Q=%+a,P. (12) 


This is an equation suggested, as a rough approximation, by economic analysis, which 
further suggests, and indeed almost:cequires, that the sign of a, should be negative. Suppose 
that we wish to compare this equation with the results of a set of observations on the 
quantity of a commodity sold at various prices. We do not expect that for each pair of 
observations equation (12) will be exactly obeyed. Wherein lies the source of the error? 
The first possible procedure, called by Geary (1948) the ‘error-in-variable’ approach, is to 
suppose that our observations on Q and P are affected by errors of observation so that instead 
of given values of Q and P we observe, say, values of Q+¢€g and P+ 7 . In short, we have 
the case of a functional relationship with observations subject to error. The second pro- 
cedure is to regard equation (12) as a condensed version of an equation from which a number 
of variables (such.as income, prices of substitute commodities and so forth) have been 
omitted, so that the true equation should be the functional relation 


Q = ay +2,P+f(U,, Uy -.-). (13) 
We further suppose that the function f behaves like a stochastic error term and hence wish 
to write something representing an ‘error-in-equation’ approach such as 

Q = %+a,P+e. (14) 
Unfortunately, this is a meaningless expression, but we can transform it into something 
possessing a meaning by writing 

q = Pot fyprte (15) 
or Ip = Pot BP +Ep. (16) 
In the first of these we have completely altered the model. The function f(U,, U,, ...) can only 
be represented as a random variate if we postulate something about the class of values which 
we shall observe in practice. We are therefore imposing a requirement on the behaviour of 
the events we are to consider. Our quantities and prices are now themselves random 


variates and we have to imagine them as possessing a frequency distribution. The alternative 
model, represented by (16), allows the quantity to be a random variable for any fixed price. 


7-2 











100 Regression, structure and functional relationship. Part II 


We note, however, that (16) is an equation such as we should derive as a conditional 
distribution by fixing P in (15), and if ¢p has zero mean 


E(qp) = fo t+ Ay P, (17) 
which is the regression of g on p in (15). 

10. We may call a set of equations in variables or variates a structural set, and describe 
the analysis of them a structural analysis. The distinction between regression and structure 
is that in the first we are interested in functional relationship between means of dependent 
variates and predicated variables, as in equation (17), whereas in the second we are interested 
in relationships between variates, as in equation (9). We may also admit into a structural 
set equations such as (16) where the variates depend on a variable entering explicitly into 
the equations. These distinctions may, perhaps, seem somewhat overdrawn, but they are 
worth making because of the differences in technique which are required to analyse the 
situations to which they correspond. 


11. At this point some further definitions are desirable: 

Endogenous and exogenous elements.* In dynamic economics, and indeed in any quanti- 
fiable system under scientific examination, we have certain elements which are inherent in 
the system and others which represent the influence of factors impinging on the system from 
without. Prices, quantities produced, and income are inherent in the economic system and 
are endogenous. Rainfall, temperature and warfare are exogenous. In a laboratory experiment 
on Boyle’s law pressure and volume are endogenous; temperature, magnetic forces, and 
earth tremors, are exogenous. The classification is a useful one, but I am doubtful whether 
it would stand up to logical analysis unless we are prepared to postulate something akin 
to causality. It is reasonable to suppose that rainfall may affect demand but demand will 
not affect rainfall, or that earth tremors will affect our instrumental readings, but our 
instruments will not generate earthquakes. The difficulty is that if the two elements, 
exogenous and endogenous, are concomitant, no statistical analysis can distinguish which 
is causal to the other. The distinction between their natures must therefore be made on 
prior, non-statistical grounds. Fortunately, it does not seem necessary to attempt to make 
the distinction very precise. For present purposes it is enough to remark that exogenous 
variables are observable; or so I understand it, for if they were not they would be con- 
solidated into an error term. 

Predetermined elements. In some circumstances, at least, then, we may regard the 
exogenous elements as predetermined like a fixed predicated variate. There is a second 
class of elements which are essentially predetermined, namely, those endogenous variates 
which are lagged. In an equation giving production at time ¢ in terms of price at time ¢—1 
we may regard price as a known element. If production at time t depended on price at time 
t and price at time t — 1, the latter would be predetermined but the price at time ¢ would not, 
unless the form of relationship were such that the price at ¢ were regarded as predicated. 

Latent elements. In the equations determining the behaviour of the system there will 
appear certain elements which are not observable, or at least are not observed. They appear 
explicitly in the formulation of the problem, but statistical evidence concerning them is 
indirect. In the terminology of Koopmans (see, for instance, Koopmans & Reiersdl, 1950) 
the unobservable variates are said to be latent. As I shall use the term, error variates, though 
unobservable, are not latent. Latency refers only to those variates (like g in factor analysis 


* T use the word ‘element’ in this context to denote indifferently variables or variates. 





I 








al 











M. G. KENDALL 101 


or demand in economics) which are supposed to measure some real quantifiable element in 
the system. 


12. Thus, in the light of prior analysis or information about the system, we may write 
down a set of equations which determine its motion. For example, if we take the error-in- 
equation approach, with variates not subject to experimental error, we have in the notation 
of Monograph 10 of the Cowles Commission (Koopmans, 1950) the typical equation 


Gt K ¢t’ 
ZX DA ydt-7)+ XX Ver%(t—7) = w(t), (18) 
i=17=0 k=17=0 


where the y’s are endogenous and the z’s are exogenous variates. There are, say, g equations 
of this kind. u(t) is the ‘error’ term in the equation which is known as the disturbance. 

We may write equation (18) more simply. First of all, if D is an operator lagging the time 
element by unity, i.e. if Df (t) = fit—1), 


we may replace £f;,y(t—7T) by fy(t), where # is now a polynomial in D. Secondly, using 
a matrix notation, we may write the set of equations such as (18) in the form 


By+Tz =u, (19) 


where B is a gy x G matrix, Tis a g x K matrix and y, z, u are column vectors (matrices of 
orders G x 1, A x 1,gx 1, respectively). If G = g (and we need not seriously consider any 
other case, which would imply either that we have more latent variables than equations or 
vice versa) the solution for the y’s in terms of z’s and w’s is then simply 


y = —B'Tz+ Bou, (20) 


where B-! means the function of the coefficients #; together with the operator D obtained 
by inverting the matrix as if D were an algebraic quantity and then expanding any 
denominator terms in D as an infinite series of powers of D by binomial processes. We 
require for the validity of (16) that Bj, the matrix of terms independent of D, shall be 
non-singular. 


13. It is, however, of some importance to remark that the object of considering these 
equations is not to solve for the latent elements y but to estimate the constants # and y. To 
that extent the problem resembles the determination of regression coefficients or constants 
in a functional relationship. It may fairly be asked why, when faced with a stochastic 
complex of this type, we do not treat it simply as a multipie regression problem expressing 
each of the endogenous unlagged elements in terms of predetermined elements. I return 
to this poit in § 28. 


14. The disturbances u(t) are not observable. It is supposed, in general, that they have 
a joint probability distribution with finite moments up to at least the second order. In 
simpler models it is also assumed that they are independent among themselves, that 
successive values of any one are independent and that they are independent of the 
K exogenous elements. In any case, of course, it follows that the G endogenous elements 
are also random variates connected with the w’s by g equations of type (18), provided that 
we remember that any endogenous element with 7 > 0 is fixed at time t. 

15. The general problem of structural analysis may then be stated as follows: on prior 
grounds we write down equations of type (18) which we regard (at least to a first approxima- 
tion) as determining the system. We next have to examine whether the system is capable 











102 Regression, structure and functional relationship. Part II 


of solution in the sense that it permits of the estimation of any or all of the parameters 
fand y. If it does, we perform the estimation and consider whether the values obtained are 
reasonable in the light of prior knowledge. If they are, the resulting model fits the data 
and we can proceed to test it on new data by examining its value in prediction or otherwise. 
If the data do not fit the model, or if it fails as a predictor, we have to begin again and to 
revise the system of equations.. 


16. This leads us to a discussion of ‘identifiability’, that is to say, to the possibility of 
uniquely determining the parameters # and y. Four cases arise: 


(a) no parameters can be determined. The system is then said to be completely 
unidentifiable; 

(6) some but not all parameters can be determined. The system is then partially identifi- 
able and those parameters which can be determined are identifiable; 

(c) all the parameters can be determined and there are only just enough equations to 
enable determination to be carried out. The system is then completely identifiable. 

(d) there are more elements present than are necessary for complete identifiability. The 
system is then over-identified. (This does not mean that the equations are inconsistent 
among themselves, but only that we could do without some of the exogenous elements and 
still have an identifiable system.) 


17. Identifiability is not a matter of sample size. Even if we have an indefinitely large 
sample and for all practical purposes know the distribution functions of the disturbances, 
it may still arise. In fact, it is not, in full generality, solely a statistical problem; but I shall 
be concerned with it only in a statistical context. 

To take a case so simple as to be almost trivial: suppose we have a population which is 
composed of the sum of two independent variates y and z, each with known mean but 
unknown variance. It is clear that, however many observations we take on the sum y +2, 
we can determine only the variance of that sum, say v, + v2, and cannot determine v, and v, 
separately. These two parameters are then unidentifiable. 

Consider now the economic case wherein we have a demand equation expressing demand- 


price (d-p) relationship d—Pyo—f,p =. (21) 
a supply equation (s-p) relationship 
8—Yo—ViP = 1; (22) 
and, for economic reasons, identify both supply and demand with quantity sold: 
s=d=q. (23) 


My personal opinion is that the economist is creating a difficulty for himself by adopting 
equation (23) in conjunction with (21) and (22), but let us pass that over. At this stage it is 
enough to observe that the practice is generally accepted and leads to an identification 
problem. In fact we have 4-Po-Pip = 6, ) a 

Y-Yo- MP = 1.) 
We postulate nothing about ¢ and except that they are stochastic variates. We can then 
clearly form an infinite number of pairs of equations such as 


(1+1)q—(Bo+lyo)— (2, +l) p = €+ ly, 


(25 
(1+m)q—(Bo+mye)—(f,+my,)p = €+m7, , 











M. G. KENDALL 103 


and these sets are indistinguishable from one another so far as concerns the observations, 
unless we observe or postulate something about the disturbances ¢ and 7. Thus we cannot 
estimate the constants and the system is unidentifiable. 

We may even have situations where a good deal is postulated about the disturbances 
without resolving the unidentifiability. For instance, in the case of equation (24), e and 
7 may be known to be normal; but then so will be e +17 and e+ my. Or they may be supposed 
normal and independent; but it will still be possible to have values of / and m (depending 
on the variances of € and 7) such that ¢ + ly and € + my are normal and independent. To make 
the system identifiable we have either to postulate that ¢ and 7 will have such distribution 
laws that this kind of thing cannot happen or to introduce an exogenous variate into one of 
the equations at least. I consider these possibilities in more detail below. 


18. In the foregoing I have spoken of identifiability in relation to the determination of 
parameters in linear equations. A more general definition, adapted so as tu cover both 
parametric and non-parametric specifiestions, is given by Koopmans & Reiersdl (1950) 
following Hurwicz’s paper i Monograph 10 (1950) on a ‘generalization of the concept of 
identification’. We begin with a specification of the system, such as a set of equations of 
type (18), expressing what we suppose to be the relations between the various elements, 
and containing certain unknown features, such as the constants # and y. A structure is 
defined by Koopmans & Reiersél as this specification together with a specification of 
a particular probability distribution of the latent variates. A model is a set of structures. 
More usually we think of these two ideas in the reverse order. The general specification, is 
called the model, and the structure is one of the particular values it assumes when the 
unknown parameters and probability distributions are determined. The order of presenta- 
tion of the two ideas is not of great importance; but it is very important to observe that the 
model, in this sense, specifies nothing about the disturbances except that they are stochastic. 
It is the structure which specifies their distributions, either directly or implicatively by 
specifying the constants # and y and the distribution of the y’s. 


19. In the general parametric case represented by equation (18) an equation is un- 
identifiable if and only if it is possible to construct another equation, by linear combination 
of the equations in the model, which also conforms to the specification constituting that 
equation. This is evidently sufficient. The article by Koopmans, Rubin and Leipnik in 
Monograph 10 shows that it is also necessary. Only linear combinations can bring about 
unidentifiability except in some very special cases when there is functional dependence 
among lagged endogenous variates. 

It follows immediately that when nothing is specified about the disturbances a necessary 
and sufficient condition for the identifiability of an equation A in a linear model, whick is 
subject to the condition of specification that certain endogenous elements are absent 
from A, is that if we form determinants of order G — 1 from the coefficients with which those 
excluded variables occur in the other equations of the model, one determinant at least does 
not vanish. I refer to this below as the rank condition. 

For example, in the simple case of equations (24) the parameters are all identifiable if we 
add an exogenous variate x to one equation, giving, say, 


q—Po-PiP +h (26) 
Y—-Yo-ViP = 4)- 








104 Regression, structure and functional relationship. Part II 


If we suppose, without loss of generality, that the disturbances and x have zero mean it is 
easy to see that consistent estimators of the constants are given by 
bo = % = 9, 
Lqx—c,Upx = 0, (27) 
Xgqa —b, Xpx = X2?, 
where summation is over sample values, or, in the limit, denotes moments of the second 
order. It is to be remembered that x is independent of € and 7. 


20. It may be useful at this point to exemplify the case of over-identification. If we have 
two exogenous variates x and y we might have, instead of (25), such a system as 


q—Po—fip =€+e+Y, 

q—Yo—ViP = 7. 
The system would be identifiable if either x or y were absent and hence is over-identified. 
The point of distinguishing the over-identified case is that it modifies our methods of 


estimation. In fact, if x and y are independent we have, on multiplying the first equation 
by x and y respectively and taking expectations, 


E(qx) —b, E( px) = E(2*),) a 
E(qy) — 6, E( py) = Ey”), f 
and similarly, on multiplying the second, 
E(qx)—c, E( px) = af (29) 
E(qy) —¢, E( py) = 0. 


These equations are not, in general, consistent, and we therefore require some rule of 
procedure in estimation which avoids the conflict and, so to speak, combines the two 
estimators of b, in (28) or of c, in (29) into a best estimator. I revert to the point in §28. 


21. Let us return to consider the other possibility of resolving the identifiability problem, 
namely, the specification of the nature of the disturbances. It has been noted above in 
§16 that we may impose conditions of normality and independence without necessarily 
making the system identifiable. The question arises: What are the necessary and sufficient 
conditions to be placed upon the disturbances for the system to become identifiable? From 
a consideration of the rank condition (§19) this amounts to asking: What specifications on 
the disturbances are equally obeyed by linear functions of them ? 

Normality, as is well known, is not the only distributional property which is reproduced 
by linear functions. The Cauchy and Poisson laws behave in the same way for certain linear 
combinations. But the normal distribution is the only continuous one with finite second 
moments with the required property of stability, and if (as seems inevitable for practical 
discussions) we limit ourselves to such types of distributions we are confining ourselves 
effectively to the normal case, provided that the disturbances are independent. Thus we 
may say that if we postulate that the disturbances are independent and do not follow 
a stable law, the system is identifiable. Unfortunately, this is not much of a help. We usually 
do not know what the law of the disturbances is or whether they are independent; even 
if we assume independence we usually assume normality in order to resolve the estimation 
problem; and in any case our sample is rarely large enough for use to be able to distinguish 




















M. G. KENDALL 105 


normal from non-normal variation in the disturbances. It would seem that in practice we 


are forced back on to the alternative method of introducing exogenous variates into the 
equations. 


22. Nevertheless, there remain in this field a number of interesting theoretical problems 
for solution. There seems to be some essential relation between normality and indeterminacy 
in linear systems (as exemplified by Geary’s results and the unidentifiability results referred 
to above). Normality implies indeterminacy. The question is how far the converse is true, 
i.e. that only normality implies indeterminacy. In terms of the identifiability problem, 
suppose that we require of our disturbances that they follow a particular law but are not 
necessarily independent; how far will linear functions obey that law (a) for a finite set of 
linear functions and (5) for an infinite set of linear functions ? 

It would be pleasant to be able to prove a general theorem to the effect that if u,, wa, ..., Ux 
all have the same law of distribution and ZA,w, also has that law for arbitrary A’s, then the 
w’s are normal. Unfortunately, this is not true. Lindley (1950) has recently shown, for 
example, that if u, and uw, have finite variances and zero means then Au, + u, is distributed 
in the same form for all A’s if and only if w, and u, have an elliptical distribution. Broadly 
speaking, this means among other things that the density function of u, and wu, is constant 
over ellipses au + buj = constant. More specifically, it is necessary and sufficient for such 
a distribution to exist that, if d(t) is the characteristic function of u, the function 


T fR 
G(R) = lim f tat| J,(tR) RAR dit) (30) 
Toa J0 0 
shall be a distribution function. 

23. We have got rather a long way from the problem of identifiability as it originally 
arose, and I return to a more general discussion. The basic concept involved concerns the 
extent to which an examination of the data (however extensive) will permit of a distinction 
between certain classes of hypotheses. However, identifiability in the present sense is not 
the same thing as distinguishability. Suppose, for example, we find that a series can be 
represented by a Yule autoregressive equation 

Uppo + Hy, + Puy = E42, (31) 


where the e’s are random. Such a series could equally well be generated by 


U1 + PU = Mr (32) 

where the 7’s themselves obey Mi tT = Ga. (33) 

provided that a= p+T, (34) 
f= pt. 


How, then, do we distinguish between two cases? Not on the basis of observation of the 
u’s. So far as concerns the observed series, the generating scheme may be either of the Yule 
type with random disturbances or the Markoff type with disturbances which are themselves 
of the Markoff type. The hypotheses are observationally indistinguishable, and indeed are 
equivalent unless we depart from the attitude that they are descriptive and that the 
constants a, £, p, T represent some property of the system such as would, for example, 
enable us to write down structural equations. In the latter case there is a difference: in (31) 
there are two endogenous constants, whereas in (32) and (33) one constant is endogenous and 











106 Regression, structure and functional relationship. Part II 


the other exogenous. This would be important if we were interésted in the prediction of 
behaviour when structural parameters alter their value, for the exogenous parameter 7 is 
not in general capable of alteration. The indistinguishability of the two models is not 
a matter of identifiability. It depends on the prior classification into endogenous and 
exogenous elements. Identifiability arises in connexion with lack of distinction within the 
same model. 


24. One last point on identifiability. It has been pointed out above that the difficulties 
are not removed by increasing the number of observations. They are, as it were, inherent 
in the formulation of the model. At first sight it appears contradictory for Koopmans 
& Reiersél to say (1950, p. 170) that identifiability is subject to statistical test. Indeed, it 
would be so if they meant that we can determine from the observations whether a model 
is identifiable, that is to say, whether a structural system is identifiable for arbitrary values 
of the parameters. What they mean, if | have interpreted them correctly, is that there may 
be certain parameter values which render a model identifiable and others which leave it 
unidentifiable, and that it is possible to ascertain by examination of the data whether a given 
parameter falls into one class or the other. This, I think, amounts to saying, in the para- 
metric case, that if we can estimate the parameters we can also test the significance of the 
estimates and say whether the vanishing of the determinants required by the rank condition 
($18) is possible or not within acceptable sampling limits. 


ESTIMATION 


25. I proceed to consider the estimation of the parameters in equations of type (13). 


Of the two methods which have been put forward, one {the maximum-likelihood approach) 
requires a specification to the effect that the probability distribution of the disturbance is 
known at least in general form; the other (least squares) must either cast back to the 
maximum-likelihood method for normal disturbances or adopt the principle of least 
squares itself as a principle of inference. This is reasonable enough when conditions are such 
that we can apply Gauss’s theorem that least-squares solutions in linear problems give 
minimal variance to the estimators. But where time-lags are present the theorem may not 
apply, and if we use it we are making a non-proven extension of the theorem. The general 
problem of formulating a valid system of estimation when there are autocorrelations in the 
residuals has not been solved. 


26. The customary assumption to make is that the disturbances are distributed in the 
multivariate normal form, and it is noteworthy that in such circumstances the maximum- 
likelihood estimators are independent of the dispersion matrix of disturbances, so that the 
problem becomes determinate without any specific assumption concerning that matrix. 
The actual process of solution may be troublesome, arithmetically speaking, but does not 
raise any new theoretical difficulties so far as 1 am aware. 


27.. The explicit solution of a complete set of structural equations, however, is frequently 
unnecessary, in the sense that we are interested in only one of the equations, and inconvenient, 
in the sense that some of the variates occurring elsewhere may be difficult to measure. We 
are not justified, in general, in writing down the one equation and ignoring the others; such 
a procedure may introduce bias (as was emphasized by Haavelmo (1944)). But having 
written down the complete system we are justified in considering whether we can estimate 
the coefficients in one equation without having to estimate uninteresting coefficients 





————w 


— 





_—_— _ a. an - 





— 





M. G. KENDALL 107 


occurring in other equations. Such a method has, in fact, been devised by T. W. Anderson 
(see paper IX in Monograph 10). It is known as the ‘reduced form method’, or, more 
descriptively but far less conveniently, as ‘the limited-information maximum-likelihood 
method’. Essentially the process is a return to the regression approach. By a solution of 
type (20) we express the endogenous elements under examination in terms of a regression on 
the predetermined variables. Reference may be made to Anderson’s paper in Monograph 10 
or to Anderson & Rubin (1949) for the details. 


28. It is very natural to inquire why we do not use equation (20) in finding the solution 
of the general structural set, that is to say, why we do not treat the problem as one in the 
determination of simultaneous regression equations. The answer is that the methods are 
equivalent when the system is just identified by its exogenous variates. The only difference, 
if I have understood the situation correctly, arises when the system is over-identified, in 
which case the ordinary maximum-likelihood method is better, but the limited-information 
method gains in ease at the expense of ignoring relevant information. In the only practical 
case where such equations have been worked out in great detail, Klein’s study (1950) on the 
U.S.A. economy, the reduced-form solutions, are very close to the maximum-likelihood 
solutions for the complete set. 


29. Clearly much more critical study is necessary on practical data before we can say 
that the approach through structural equations is fruitful. Conceptually it seems to me 
sound enough so far as concerns the use of prior knowledge or analysis in model-building, 
but the error-in-variable approach may prove to be better. Practical results in macro- 
economics have not yet been very encouraging. It may be that econometricians have tried 
to do too much in formulating equations concerning a whole economy instead of con- 
centrating on a single commodity or a small local market. Or it may be that the key to the 
successful use of the methods lies in our treatment of the disturbance functions, and that we 
cannot sweep everything which is unknown or inconvenient into a disturbance term and then 
assume that it behaves like an observational error. What are required here, apart from more 
extensive data, are (a) methods of testing the disturbance terms to see how far our hypo- 
theses concerning them are plausible, (b) methods of estimation which are partly independent 
of the nature of the disturbances, so as to give us greater latitude in the hypotheses, and 
(c) investigations into the question how far departures from such hypotheses affect the 
estimation. In other statistical fields such topics have been investigated with considerable 
profit. There seems no reason to doubt that similar investigations would be rewarding in the 
theory of structure. 


30. To round off this review of the theory of regression and structural analysis I should, 
perhaps, have mentioned in more detail the use of instrumental variates and problems of 
multicollinearity connected with Frisch’s confluence analysis. My colleague Mr Durbin has, 
however, been working in this field and I therefore leave these topics in his hands for later 
treatment. 


A first draft of this paper was read by Mr J. Durbin, Mr D. V. Lindley and Dr P. A. P. 
Moran, to whom I am very grateful for a number of constructive comments and criticisms. 











108 Regression, structure and functional relationship. Part II 


[Note added in proof.) In regard to Berkson’s proposal (§§ 4-6) there is another possible 
way of looking at the matter which well illustrates the importance of an unambiguous 
symbolism and nomenclature in this subject. If we denote observed quantities by a prime, 
the procedure described in § 5, wherein a value of the variable is aimed at, should be written 


ty = X'+Ex, (35) 
where é is a variate and hence the latent element xy. is also a variate. The model now 
asserts that y’ = atx +0, (36) 


that is to say, the basic relation is not a functional but a structural equation connecting 
the variates y’,xzy. and 9. If we substitute from (35) we have 


y = aX'+(aby-+7). (37) 
The term in brackets on the right is a variate and the estimation of a is then a problem 
in regression analysis, leading to the least-squares estimation Zy'’X'/2X" for a. This is 
unbiased for fixed X’ and hence follows Berkson’s result. But it does not prove that there 
is only one regression or affect the estimation of constants in a functional relationship. 


REFERENCES 


ANDERSON, T. W. & Rustin, H. (1949). Ann. Math. Statist. 20, 46. 

Berkson, J. (1950). 7. Amer. Statist Ass. 45, 160. 

Geary, R. C. (1942). Proc. Roy. Irish Acad. 47A, 63. 

Geary, R. C. (1943). Proc. Roy. Irish Acad. 49A, 177. 

Geary, R. C. (1948). J.R. Statist. Soc. B, 10, 140. 

HaAveELmo, T. (1944). Econometrica, 12, supplement. 

Kiet, L. (1950). Hconomic Fluctuations in the United States, 1921-1941. New York: John Wiley and 
Sons; London: Chapman and Hall. 

Koopmans, T. (ed.) (1950). Statistical Inference in Dynamic Economic Models. New York: John Wiley 
and Sons; London: Chapman and Hall. 

Koopmans, T. & REIERSOL, O. (1950). Ann. Math. Statist. 21, 165. 

Lrnptey, D. V. (1950). Proc. Camb. Phil. Soc. 47, 337. 

THEIL, H. (1950). Indagationes Mathematicae, 12, fasc. 2. 

Watp, A. (1940). Ann. Math. Statist. 11, 284. 





nd 








[ 109 ] 


ON THE CONCURRENCE OF A SET OF REGRESSION LINES 
By K. D. TOCHER 
Imperial College of Science and Technology 


INTRODUCTION AND SUMMARY 


Many experimental methods determine unknown datum points by plotting two linearly 
related observables against each other and observing the intercept on one of the axes. It is 
frequently suspected that the datum changes between experiments, and a test of this is 
required. 

Three cases arise: first, only one observable is subject to error, and the intercept required is 
on the axis of that observable; secondly, the required intercept is on the axis of the observable 
free from error; and, lastly, both observables are subject to error. 

The first case is solved by a straightforward application of analysis of variance, while the 
last, although strictly the correct model for all real situations, is beset by many difficulties 
and, in the absence of any completely satisfactory procedure for solving the ordinary 
regression problem, is not considered here. Only the second case, which is a close approxi- 
mation to the situation most frequently arising in this experimental technique, is considered 
here. 

The usual restriction that all errors have equal variance is imposed, but could be relaxed 
in the case where their ratios are known. A further assumption of normality of error is made, 
but deviations from this are no more serious than in the usual analysis of variance. 

In general, the critical significance level of the test developed is a root of a complicated 
equation, best obtained by an iterative process described here in detail, but in special cases 
an exact analytic solution is possible. 

If a common intercept «oes exist (in the sense that the test does not reject that hypothesis) 
« confidence interval for it may be obtained. The whole process is illustrated by a numerical 
example, and the extension to composite tests of equality within groups of sets of lines 
is given. 

THE PROBLEM 


Consider a group of m sets of data, the ith set consisting of n; pairs (x;,, y;,) (” = 1, 2, ..., 43 
i = 1,2,...,m). We assume that there is a linear regression of y on x for each set, viz. 


E(yir) = a;,+f;2,,. (1) 


The x,,are assumed known, and the y;, normally and independently distributed about their 
mean values with a constant variance of o?. The line for the ith set meets the x-axis at 
a value 2; = —a,/;. We require to test the hypothesis, H, that x; = const. (i = 1, 2,...,m). 


THE SOLUTION 


The hypothesis H is a composite one, since the common value for the intercepts is 
unspecified. Consider one of the constituent simple hypotheses H(z») that 2, = x, 


i = 1, 2,...,m), ie. that 
(0 a, + P,Xq = 0. (2) 











110 On the concurrence of a set of regression lines 


This is a linear constraint on the unknown parameters «;,; and the usual analysis of 
variance test can be applied. Put 


Vip = UX 2X Yir = ie 
dX (%j,—%;)? = 8;, ~ Yr—-yi)? =S, | (3) 


~ (%j,—%;) (Yir— Yi) = Di- 


The unrestricted sum of squares is 


S= x > (Yir—%,— PB, Xjy)?, (4) 
‘ith minimum value " 
wi InimuM vaiuc So = x (S; — p3/s;,), (5) 


which has a x’o? distribution with > n;— 2m = N —2m degrees of freedom. S, is obtained 
i 


from S by substituting the least-square estimates of a; and /; which are 


a; = 9;—5,%;, b; = 4/8; (6) 
The restricted sum of squares obtained by substituting (2) in (4) is 
S'= LX {Vir — Bi(Xi — Xo)}?. (7) 


The least-squares estimates of £; in this case are 
S _ Pit N(Zi— 2%) ¥; 
"8 +N (Z;—Xp)* ’ 


an ee ee 
Ss’ —- S.+ zo _ (it nl%; Xq) ¥;} |. 
. | rene 8;+2,(%;— Xp)" 





which reduce (7) to 





On the hypothesis H(a9), Sg—Sp is the sum of squares due to restraints and has a y*o* 
distribution with m degrees of freedom, independent of S,. After some algebraic reduction, 
we find that 


eo» + NASH; — DAT; — Lp)}* 
So— So = f(%) = ~ ai a ie 
-p-b, 
= > ae (8B) 


i 8; +N(%;— 2p)” 
where 2p; is the estimated intercept of the ith line, —a,/b,. 
N—-2m “he : 
Thus under H(z), ——" fit) has the F distribution on m and N—2m degrees of 
o 


freedom so that this hypothesis is contradicted at the a % significance level if 


mS, 
S(%0) > Foam Fe (9) 


The hypothesis H consists of the conjunction of H(z,) for all x9..Thus H is contradicted 
at a % level if (9) holds for all xz). Conversely, the critical significance level for H is given 
by the F ratio N — 2m 


F, = aS. Min f(a»). (10) 





a 





f 






K. D. TocHER 111 


The value of 2) minimizing f(z 9) is the maximum-likelihood estimate of the common 









































j intercept, since f(x,) only differs from S, by the constant S,. The exact interpretation of the 
significance level is discussed in the next section. 


If, for a given level «, (9) does not hold for all 29, the exceptional x’s form a set of possible 

values of the intercept, which contain the true common intercept, when it exists, in all but 

x % of any sequence of independent trials. Thus they form a confidence set for x), and 

since, as we shall see later, f(z») is parabolic in shape, this set is the interval between the 

f two roots of 


a? 


fl) = "8 


N-—2m 


and so constitutes a confidence interval for the common intercept. 


COMMENTS ON THE SOLUTION 


The classical method of determining a test function for a composite hypothesis considers 
only functions whose probability distributions are independent of the ‘nuisance’ para- 
meters; in the Neyman-Pearson terminology these functions define similar regions, while 
the older phrasing is that the appropriate statistic for the corresponding simple hypothesis 
has been ‘Studentized’. From this set of functions one is chosen that has certain optimum 
' properties. 
Neyman & Pearson (1933) have shown how to determine such regions under certain 
conditions, but if those do not hold, there is no general method of procedure known, and for 
practical purposes we may assume that similar regions do not exist. It is easily shown that 
the conditions do not hold in this problem of testing equality of intercepts. 
Another example of a composite hypothesis with no similar regions is the testing of 
equality of performance in a comparative trial. In his discussion of such tests Barnard 
(1947) suggested that the significance level of a test function should be taken as the maximum 
probability of exceeding that value of the test function, considered as a function of the 
nuisance parameters. We are then sure that no matter what the true values of these 
parameters may be, the probability of making the mistake of rejecting the hypothesis 
when it is true is certainly less than the significance level. . 
It is in this sense that the criterion suggested for the test of equality of intercepts gives 
a significance level. The method could be applied to any test function of the constituent 
simple hypothesis, and the choice of function is governed by the required optimum 
properties of differentiation from alternative hypotheses. Our test has been based on the 
maximum-likelihood ratio test which has shown its value as a general method in many 
instances. Kolodziejezyk (1935) has shown that for the simple hypotheses there are no 

) uniformly most powerful tests, and this indicates that the same will be true for the composite 
hypothesis. 


— 


SPECIAL CASE 


In the special case of all sets of data having common values of the z an exact analytic 
solution is possible. 

Put n; = n, ¥; = ¥, 8; = sand ¥—z, = 6, then the form (8A) for f(x») reduces (9) to the 
inequality 


- MmSoF, 
g? S92 — 28D9-p, 2g2@> — Os. *). 
a WP 8+ Z pis > aN — 2m +O) (11) 











112 On the concurrence of a set of regression lines 


If this is true for all d we must have 





and LPi¥i—(LIi— A.) (Lpj—nsK,) < 0, 
or on rearranging, that ; : 
nsKa— X( pit nsyi) Ke + Yi dp Pi- (LPG)? > 0. (13) 


Replacing the inequality in (13) by an equality, the required significance level is obtained 
from its smaller root, i.e. from 


aa 2m +), E 
F, = DmaS, (X (pi + nsyi) — (Lp + nsyj)? + 4ns (> p,¥;)7}4). (14) 
In numerical examples, the terms of (14) may be large and nearly equal. In this case it is 


best to calculate the coefficients of (13) and solve it by a series. If the quadratic is 


z?-ax+b=0 (0<b<a), 


b b iu a 

-\1——+2[] ..-}. 

altar #(@) ~ 

The quoted terms are sufficient to give the root, if this method is necessary. 


The confidence interval is obtained by rearranging (11) as the quadratic in 6, 


(x pi—nK, 8) 0? — 28(¥ py) d+ 8d ¥i—K,) = 0, (15) 
i i i 


the lower root is 


giving two roots 6,, 6, and hence a confidence interval [% — 6,,  — 6,] for x. 
It should be noted that (12) should hold for all this analysis. Writing (12) as 





Sma B Si sil (16) 


we observe that this requires the usual analysis of variance test to detect a difference in 
slope in the lines at the level « 

If the lines can be assumed to have a common slope, the problem reduces to determining 
if the lines are identical, a problem soluble by standard techniques. 


AN ITERATIVE SOLUTION OF THE GENERAL CASE 


In general, the minimum value of f(z) can be obtained by a lengthy process of interpolation 
in tables of f(x) and f’(x), but in the majority of cases this may be replaced by an iterative 
procedure starting from the following simple approximation. 

If the variation of estimated intercept from line to line is small, the unknown 2, in the 


denominator of each term of (8 B) can be replaced by x»; reducing f(z) to a weighted sum of 
squares 


f(z) = Zoe 2), (17) 


n; p;,; 


where —_—_—t—. 
8; + 0,(Z;— Xo;) 


Oo; = 








) 








K. D. TocHER 113 
This approximation produces a minimum value 
, (Xw;Xpo;)? 
Min (x)= Eo,xgi— oer (18) 
LO; 2%; 


at the x value 
Lo; 


It is of interest to note that the weights used in this weighted sum of squares are those 
obtained by an application of the well-known device to obtain the variances of the individual 
intercepts :* 


— OF my 1 es _ 1 
ha rE Veo) =F3 VG)+¥3V (7) 


a Fe 

bin, 528, 
— 4+ M(T— Zoi)” 0g _ 
n,p,b; O; 


The iteration is required to obtain an exact solution to the equation 





f'(x) =? ¥ 1: Dib fs; “A n(Z; a x) (z; _ 2oi)} (29; 


{8,+n,(%;—x)?}* + apie (19) 


and a suitable one is obtained if we put 
(n) 
yin) _ LOE Toi 
> 
Lo 
wrt) = np, b{8; + n{Z;— x) (%;—Xq;)} 
‘ {s; + n,(z,—2™)?}? s 


where 





(20) 


starting from the initial values ©) 
wf = w;,. 
If (20) converges, it clearly converges to a root of (19), and this root, being a weighted 
mean of 2o;, lies in the range of p;. 
To investigate the convergence of the iteration we consider the solution of the equation 
x = (x) by the iteration x,,, = y(xz,). We have for a root x 


Thy —- T= Y(x,)—2x ad (Xp) — (x) = (x, —%) y'(g), 


where £ lies in the range [x,,2]. Thus if y’(€)<1 for all £ in the neighbourhood of the 
root x, and 2, is chosen in that neighbourhood, the iteration will converge at least as fast as 
a geometrical progression. This is only a sufficient condition. Necessary conditions are more 
involved. If y’(x) is continuous, then it suffices if y’(x) < 1 at the root and zy is near enough. 

In the application 





_ LH ,(X) Xo; 
Ue) = Sole)” 
8, + n,(%; — X) (Z; — Xp;) 
where w(x) = 0, p;5; 


{s; + n(%,;—x)*}* 
We easily obtain y’(x) at the root x = 2, 


X (9) (oi — Xo) 





Y(t) = — Zw; (Xp) . 


* I am indebted to E. C. Fieller for pointing out this interesting fact. 
Biometrika 39 8 











114 - On the concurrence of a set of regression lines 








Xi —Z 4(2%)—Z;) 
and W4(%q) = w,(x _ oi 4 - = 21 
Hee) = te) 8, +14(%;—Xp) (T{— Xj) 8; +4(X;— Xp)? 2) 
ae 3(%q; — Z;) Ea : 
” 3+ n,(%,—2,)8 0 = — [tj W;(%) say. 
Note that y,; is independent of x, and can be calculated direct from the data. 
An approximate sufficient condition is 
| XY OX) (Xo — Xo) I< X (2%); 
t t 
since W(%)>O0 (¢ = 1,2,...,m), 
which certainly holds if 
|ml<) (¢ = 1,2, ...,m). (22) 


Loi — Xo | 


Inserting the first approximation for x, in (22) gives approximate sufficient conditions 
for convergence. They can be compounded into a weaker condition by noticing that as 
2% lies in range R of the 2p;, (22) is satisfied if 


1 
= , lie tie Qe. 2: 
R Max Zq, in Loi “Max la (23) 
The minimum value of f(x) is obtained as | 
LA; (x) (Xp; - x)", (24) 
_ __ 4 P;b; or 
where A(x) = s+ n,e—%,) (25) 


and 2 is the exact root. 


Approximate confidence limits can be derived from the approximate form (17) of f(z) 
as the roots of 








“2-2  ). : SW. 22, — mSp .) = 
Lox 2(2 0% qi) 2 + (Seas, Vu2m?* 0. (26) 
The exact equation to be solved is 
nl ij mS, F, 
(x A, (x)} x? — (2 A(X) Loi} x + 1% A(x) x; - V ve | = 0, 
or, using this as a definition of the W’s, as 
Wo(x) x2 — 2W,(xe) 2+ Wel) = 0. (27) 


This can be solved using the iteration 
W,(2,) 
2{Wo(z,,) x, — W,(z,,)} 
for each root in turn, using those of (26) as initial values. 
The detailed investigation of convergence is rather complicated, but a simple account can 
be given neglecting the change of the W’s with x. In this case the iteration is simp! 


Wea — Wy 


vm BW, — Mh)” 


Casi = Wo (2p) 22, - 





(28) 


This has a yy function 
78 _ WwW. 7 (W, x2 — OW. 
oid = Waa? -W, Y'(x) = W(Wo2? — 2W, x + W,) 


2(Woz—M,)’ 2(Woz— WP 
(x) vanishes at the root and so the iteration converges very rapidly. 

















K. D. TocHEer 115 


A NUMERICAL EXAMPLE 


In certain colour-temperature experiments on electric filament lamps, the power of a lamp 
is obtained from the response of a photoelectric cell to the illumination from the lamp 
for varying lamp to cell distances; the root reciprocal response is a linear function of this 
distance by the inverse square law of illumination. The ‘zero’ point of the cell is not known 
and is estimated from the intercept of the regression line. It is suspected that the zero point 
changes, with the type of lamp or the frequency of light filtered to the cell. The data below are 


from one experiment of this series using the same lamp at different distances with two 
different filters: 











Root reciprocal response 
Lamp distance 
x (cm.) 
"1 Yo 
Filter 1 Filter 2 

2278 0-802 0-739 
2406 0-882 0-812 
2524 0-956 0-879 
2638 1-028 0-944 
2760 1-101 1-012 

















Both lamp ‘zero’ about 1000 cm., and the large extrapolation involved causes the arithmetic 
to be rather heavy. The notation, exemplified by 7-139-4 = 7-139 x 10-4 = 0-0007139, is 
adopted throughout. We easily obtain 


Ux = 1-26064, Dy, = 47699, Lys, = 4386, 
r r 
% = 252123, J, = 095389, 9%, = 0-8772°, 
s = 1-430935, S, = 5-53768-2, S, = 4-59868-2, 


= 8-901321, py = 811188}, 
b, = 6-22066-4, b, = 5-66896-4, 
Zo, = 9°87923%, gq = 9738282, 
Sp = 5-53386-8, 


= 
= 
| 


Hence ns = 7154645, Lp*dXy7? —(Upy)* = 5-05905-1, 
Upy = 1-560582, Vpt+nsUy2 —-= 12159218, 
Equation (13) takes the form 


7°15464°A? — 1-2159218K + 5-05905-1_ = 0, 


or Y2— 1-699485 K + 7-071003-7 -= 0, 
i.e. K= 4-1607-7{1 +0(10-") em 
5 -1607-7 
F- > x6 x4-1607 = 1128. 





2 x 5-53386-§ 


Thus the difference of intercepts is not significant. ° 
Confidence limits on x are given by (15). 











116 On the concurrence of a set of regression lines 


From tables of the F distribution: 
2 x 5-53386-6 
Fyos = 5°14, Koos = 2 x5-55580" 


= 1-896-* 
5x6 1-896-*, 


8 DS 
é= ipl — nee [(Zpy) + {(Zpy)? — (Zp) (Ly) + (Lp? + nsLy?) Koos — SK Fo5}*] 

1-430935 . ‘ - 

SS, ee FE — 5: —t 4 2. 19 — 2-57 ...—8}4 
7 Fronga (156058" + {— 5-05905-1 + 2-3057 1] 

= 9-86694[1-56058? + /(1-79981)] 

= 1-539823 + 1-3237!, 

Yo = 96815? or 9-9462?. 


EXTENSION OF ANALYSIS TO A CLASSIFIED SET OF REGRESSION LINES 


If a significantly large variation of intercept is found by this test in data which are divided 
into groups, it is natural to ask whether the variation is only between the groups, with 
a common intercept in each group. 
Let there be p groups of sets, the ith group consisting of g; sets, the jth set of which has 
Nj; pairs (2;;,, Y;j-) With a linear regression 
E(Y jr) = Aj + Bij Xizp (r= 1, 2, ..., M33 J = 1, 2, ...5 443 a = 1,2, ..., 9), (29) 
and the same assumptions concerning the z’s and y’s as in the simpler problem. 
The composite hypothesis H’ is that there are p numbers 291, Xp, %p, Such that 
a5; + Bi;%; = 0 (j= 1,2, ....43 8 = 1, 2, coos). (30) 
A constituent simple hypothesis H’(29;, % 9, ...,%,) = H'(%9;) is that (30) is true for 
a given set 2), ..-, op: 
Using an obvious extension of the notation of (3) the minimized unrestricted sum of 


squares is 8 = SE (Ss PH 84), (31) 
tj 


while the sum of squares due to restraints can be shown to be 





Mig Pig DisLoi— Lois)” _ i 39 
TF Sig t+ Mig(Xqi— Fy)? Eh vi) = 
where 6;;,%9;; are the estimated slope and intercept of the jth line of the ith group. 
These expressions are distributed as y?o? with 


Ud (m2) = o=35% = N-2Q 


and @ degrees of freedom respectively, and thus the significance level is given by 


7 N a 2Q : sy N ae 2Q ° 
oo ae 
which has the F distribution with Q and N — 2Q degrees of freedom. 
The numerical processes described above can be applied to determine F,. The minimizing 
values of x»; are maximum likelihood estimates of the common intercepts, but if confidence 
intervals are required, these can best be obtained from each group separately. 








EE 








32) 











K. D. TocHER 117 


In a normal analysis of variance the difference between the two sums of squares 
calculated, assuming, first, one common intercept and, secondly, one for each group, has 
a xo? distribution and can be used as a direct test of the significance of between-group 
variations. In this case the two sums of squares of a constituent simple hypothesis have the 
same degrees of freedom, and the set of simple hypotheses contained in H is a subset of H’. 

The further extension of the problem to multiple classification follows the lines already 
laid down and needs no further elaboration. 


REFERENCES 


BARNARD, G. A. (1947). Significance tests for 2x 2 tables. Biometrika, 34, 123. 

Ko.opziEsczyk, St (1935). On an important class of statistical hypotheses. Biometrika, 27, 161. 

NeyMan, J. & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypo- 
theses. Philos. Trans. A, 231, 289. 











f 118 J 


A SAMPLING TEST OF THE xy? THEORY FOR 
PROBABILITY CHAINS 


By M. 8S. BARTLETT 


University of Manchester 


In my paper on the x? theory for probability chains (Bartlett, 1951) it was stated that 
numerical examination of the proposed methods would be reported later. The purpose of 
the present note is to demonstrate numerically the relevance of the theory in the case of 
a simple stationary Markoff chain with two possible states. Two separate cases were con- 
sidered with transition probabilities from one state to the other given by the columns of the 
matrices in Table 1. These transition probabilities are easily seen to give the following 























Table 1 
Case I Case IT 
Initial state Initial state 
0 1 0 1 
Final | 0 1/3 2/3 Final ) 0 2/3 1/2 
state) 1 2/3 1/3 chats} 1 1/3 1/2 
] l 1 ] 























expected relative frequencies of consecutive pairs of states in a long sequence (Table 2). 
It was shown in my paper (p. 89) that x* theory may be applied, at least for sufficiently 
long sequences, to the two-way frequencies of Tables 1 and 2, but not (in contrast with the 
complete independence case) to the marginal frequencies alone. In case I the expected value 


of x? for the marginal frequencies was given as 0-5, in case II as 1-4 (for 1 nominal 
degree of freedom). 






































Table 2 
| Case I Case II 
Initial state Initial state 

0 1 0 1 
Final )0 1/6 1/3 1/2 Final |}0 =| 2/5 1/5 3/5 
state fi 1/3 1/6 1/2 state) 1 1/5 2/5 

| 
1/2 1/2 1 3/5 2/5 1 

















A sequence of 1000 values was obtained from the random numbers in the tables of Fisher 
and Yates’s, by arranging for the first value to occur according to the marginal probability, 











r 











RC @ 2a paar 


“POae ae 


M. 8S. BartLetr 119 


and the succeeding values to occur with the transition probabilities appropriate to each 
previous state. This was done for both cases I and II (and subsequently repeated for reasons 
given below). Each sequence was subdivided into ten subsequences of 100, thus giving 
rise to ten 2 x 2 tables. For example, the first 100 values obtained in case I were: 
00010, 11001, 01010, 10010, 10010, | 
01010, 10111, 01100, 01101, 01000, 
01100, 01101, 01111, 01110, 11001, 
01010, 11011, 00111, 10101, 11010. 


Table 3 
4 





0, | 15 32 | 47 











47 52 99 


Table 3 shows the corresponding 2 x 2 table. (Since the total for 0, does not include the final 
value, which is 0, and the total for 0, the first value, which is 0, both totals in this sample 
agree at 47, one less than the total number of 0’s.) 

The expected frequencies for a total sample of 99 are obtained from Table 2 by multiplying 
the probabilities given there by this total frequency. A xy? could then be worked out, the 
standard quadratic expression (no corrections for continuity) in the deviations of observed 
from expected frequencies being used. This value for the two-way frequencies is denoted 
by x3. The similar value for the marginal frequencies (for definiteness, when the row and 
column values differed by unity, the column totals were used) is denoted by x3, and an 
appropriate y? expression for the two-way frequencies is then 


x? = x?— x3 with 2 degrees of freedom. 
For Table 3 this gives x? = 0-938 — 0-252 = 0-686. 


The complete set of values so obtained is given in Table 4. The average values are shown 
in brackets at the foot of Table 4, and it will be seen that, in spite of the usual fluctuations in 
individual values, these average values seem to agree reasonably well with the expected 
values (for x3 in case I, 0-5; in case II, 1-4; for x? in either case, 2-0). The total of all y?’s in 
case I is 37-66 (40 p.F.) and in case II 51-80, both values being quite admissible. On the other 
hand, the total of y3’s in case I is 9-32 (nominally 20 p.F.), and in case II 40-82. The former 
is significantly low at the 0-05 level, and the latter significantly high at the 0-01 level. 

Although the asymptotic expected value of 3 is known, its distribution is not. However, 
we might as a further test demonstrate that while the values of x? are not significantly 
different for cases I and II, those of x3 are, clustering round 0-5 and 1-4 respectively. It 
seemed advisable from the nature of the distributions to carry out these tests on the square- 
root scale, in order to reduce the effect of skewness. In testing a difference, not between 
two observed means, but between a mean and a theoretical value, it also seemed advisable 
to use the correct mean on the square-root scale, which is 


V2T3(n+ 1)/P4n 











120 A sampling test of the x? theory for probability chains 


for a x? distribution with n degrees of freedom. For x3 it was assumed that the correction 
factor 0-798 appropriate to the nominal number of degrees of freedom (namely, one) could 
be used; the test is not of course very dependent on this particular assumption. This gives 
expected values on the square-root scale of 1-253 for x*, 0-564 for x3 in case I and 0-945 in 
case II. The relevant ¢ statistics came out as follows: 


Case I, y? against theoretical mean ¢(19D.F.) = — 0-693 











Case II, xy? against theoretical mean ¢ = 1-938 
Case I versus case IT t(38D.F.) = — 1-745 
Case I, v3 against theoretical mean ¢(19D.F.) = —0-011 
Case II, x3 against theoretical mean ¢ = 1-222 
Case I versus case IT (38 D.F.) = — 3-190. 
Table 4 
Case I Case IT 
xi xi x” xi xe x” 
0-938 0-252 0-686 4-581 1-833 2-748 
1-484 0-252 1-232 10-949 5-470 5-479 
3-667 0-252 3-415 3-475 0-750 2-725 
5-637 0-818 4-819 0-545 0-285 0-260 
0-999 0-090 0-909 5-394 0-108 5-286 
8-182 1-708 6-474 0-343 0-082 0-261 
1-999 0-494 1-505 2-968 0-890 2-078 
0-908 0-090 0-818 9-838 5-663 4-175 
11-667 2-920 8-747 1-228 0-545 0-683 
1-999 0-494 1-505 7-287 5-663 1-624 
2-394 0-252 2-142 0-546 0-108 0-438 
0-454 0-090 0-364 5-666 0-815 4-851 
1-999 0-818 1-181 1-455 0-545 0-910 
0-938 0-252 0-686 5-324 2-305 3-019 
0-030 0-010 0-020 1-176 0-007 1-169 
2-727 0-252 2-475 8-399 4-729 3-670 
0-272 0-090 0-182 10-419 5-470 4-949 
0-181 0-090 0-091 10-418 5-470 4-948 
0-060 0-010 0-050 0-243 0-007 0-236 
0-454 0-090 0-364 2-364 0-082 2-282 
(2-350) (0-466) (1-883) (4-631) (2-041) (2-590) 


























The last value of ¢ is highly significant, as anticipated. It should be mentioned that at 
first only the ten sets of 100 making the first sequence of 1000 in each case were considered, 
and the corresponding value of t at that stage was 2-405 (18 D.¥.). It was felt that this value, 
though significant, might not be sufficiently convincing, and so a second sequence of 
1000 in each case was added. This sequential decision somewhat vitiates the nominal 
significance level for the final ¢ value, but the evidence for the anticipated difference in 
x3 in cases I and II (the difference being in the correct direction) seems clear enough. 

The rather high, though insignificant (insignificant also for the first sequences alone), 





ee tee 








S 











M. 8. BARTLETT 121 


values of ¢ occurring in the test of x? are mainly due to the total y? in case I exceeding 
expectation, and it will be remembered. that the more exact direct test of the total y? gave 
less cause for suspicion, giving an equivalent normal fluctuation (of unit standard deviation) 
of 1-27. It is, of course, possible that in finite samples the values of x? are somewhat less 
restricted when x3 has a larger average value; to settle this point, a more extensive 
investigation would be required. 


I am indebted to Mrs G. W. Walls for assistance in this investigation. 


REFERENCE 


Barttett, M. 8. (1951). The frequency goodness of fit test for probability chains. Proc. Camb. Phil. 
Soc. 47, 86. 











[ 122 ] 


ON MATHEMATICAL ANALYSIS OF STYLE 
By WILHELM FUCKS, Aachen 


1. INTRODUCTION 


Every significant text of a grammatical exposition consists of a certain material, the 
vocabulary, and some structural properties, the style, of its author. The passive vocabulary 
is formed by the totality of all words of that language, s, the author writes in, the active 
vocabulary is formed by a certain set, s’, of that totality, the selection of which is determined 
essentially by the sort of literature the text belongs to and depends only in a lower degree 
on the peculiarity of the author. Style, however, is characteristic of the author at a certain 
period of his personal development. 

The aim of the following investigation is to formulate mathematically some of the 
properties of structure constituting style, so that for a given text the application of a simple 
mathematical criterion allows its attribution to a particular author at a certain period of his 
mental development. 


2. THE METHODOLOGY OF STYLE ANALYSIS 


The analysis of the structural properties of language can be performed in two principal ways: 

(1) synthetically, by analysing the formation of a text and its structure by means of its 
elements; 

(2) analytically, by analysing a given text as to the structural properties of its elements. 

The synthetic method has gained considerable importance in communication engineering 
through the ‘information theory’. It represents the so-to-speak ‘atomistic’ foundation of 
the more ‘phenomenologic’ methods of the analytical treatment which constitutes the 
proper field of style analysis. The relation between the two methods is analogous to the 
relation between the methods of statistical mechanics and thermodynamics, and it will be 
seen that this analogy is not only formal, but has a deeper significance. 


3. THE ANALYTICAL INVESTIGATION OF LANGUAGE 


The elements of the totality of significant grammatical manifestations can be ordered in the 
following sequence, each element of which contains the preceding one as a sub-element: 


letter 

syllable 

word 

sentence 

paragraph 

chapter 

book 

work 

total work of an author 

kind of literature within a language 
total literature of a language 
kind of literature 

world literature 














-_—- WF ome VY NS 














WILHELM Fucks 123 


The text of a grammatical 1 anifestation can be conceived as a linear set, the elements of 
which may be numbered by a pusition number . Now we consider functions f(n) defined 
on this set and seek for relations between the functional values representing characteristic 
structural properties of the text. Such functions are, for example, the number of letters or 
syllables of the nth word, the number’of words in the nth sentence, the metrum, and so on. 

As numerical characteristics of the relations between the functional values, we have to 
consider the average number of letters or syllables per word, the relative frequencies of the 
one-syllabled, two-syllabled, etc., words, the average distances between an i-syllabled and 
a k-syllabled word and so on. Which of these numerical characteristics will be peculiar to 
style can only be deduced from a more profound inquiry based on a new conception of text 
structure. 

We distinguish, for the moment only qualitatively, prose and poetic language. Between 
these extremes there are continuous transitions. We will try to comprehend quantitatively 
the ‘binding character’ by a numerical characteristic. By an analogy with physics we can 
regard the states of the text-elements before the formation of text as ‘gaseous’, in prose-text 
as ‘linearly fluid’ and in poetic text as ‘linearly crystalline’. The formation process of text 
then corresponds with condensation (transition gaseous-fluid in case of prose, gaseous-solid 
in case of poetic text, respectively). This analogy brings out which mathematical aids of 
style analysis are to be applied in a particular case. The properties of symmetry and 
periodicity of the ‘crystalline’ state make group theory the appropriate instrument; and, 
indeed, it has already been brought into style-analytical investigations (cf. Bense, 1949). The 
‘gaseous’ and ‘fluid’ states, however, are only accessible to a statistical or phenomenologic- 
thermodynamic treatment. Statistical methods of style analysis using words as elements 
have been developed by G. U. Yule (1939, 1944). As numerical characteristics specific for 
style he gives the average number of words per sentence (average sentence lengths), and the 
frequency distribution of substantives, i.e. the number of substantives occurring once, 
twice, and so on, respectively. 


4. THE SYNTHETIC INVESTIGATION OF LANGUAGE 


We regard the formation process of a text as an ordered arrangement of language elements 
selected in a certain way from a stock, V, of elements. As language elements there must be 
taken into consideration: letters (V = alphabet), syllables (V = syllable stock), words 
(V = vocabulary), sentences (V = sentence set). We suppose the stock to contain a sufficient 
number of samples of each element. While the statistical investigations of language have 
so far employed words as elements, we will here choose syllables as the natural elements of 
language. In communication engineering the terms ‘Sprachatome’ or ‘Logatome’ are used. 
It is well known that communication engineering utilizes these natural elements of speech- 
flux to define the ‘Silbenverstindlichkeit’ as a measure for the efficiency of a transmitting 
system. Now we imagine the words aj, occurring in the vocabulary V as balls, marked with 
the syllable number i, contained in the urn V with certain relative frequencies p, of the 
marks i. The formation of a text then represents a certain succession of drawings of the 
balls aj out of the urn V and composition in form of a sequence 


nies i,j,...l,.... = syllable numbers, 
ajagas...a!, 


1,2,...,”,... = position numbers of the words a. 











124 | On mathematical analysis of style 


The drawings and consequently the succession of the words or their marks in the text are, 
however, in general not stochastically independent, but there exist conditioned probabilities 
for the occurrence of certain configurations of words: 
w(a;,a3), w(a}, ad, af’), 

and so on, and certain configurations of marks: 

w(i,j), w(t,j,k), 
and so on, respectively. These very probabilities represent characteristic structural pro- 
perties of the text. Conversely, if these probabilities are given for a stock V of elements 
with definite relative frequencies, the drawing process described above produces an artificial 
language with the prescribed structural properties. This synthesis of language means 
therefore a stochastic process in the form of a ‘Markoff-Kette’. 

Examples for a sequence of such artificially produced languages, which approach the 
English language, are given by C. E. Shannon (1948). 

The synthetical investigation of language gives a hint as to the direction in which 
characteristics of style may be found analytically. The consideration of conditioned pro- 
babilities for the occurrence of polynomial sequences of elements is not possible because of 
the difficulty of computation. Thus the synthetical method can generate artificial languages 
with ‘Nahordnung’ only. For style analysis, however, the very relations of connexion and 
succession of text elements over a wider range are important. In order to characterize them, 
there are (inter alia) the following possibilities: 

(1) the description of correlation between the marks at different positions of the text by 
means of an appropriate measure of correlation, 

(2) the estimation of average distances between two equal or different marks in the text. 

In our thermodynamic analogy the word-stock V, or the text, corresponds to a gas 
mixture or a fluid mixture (prose) or a mixed crystal (poetic form), respectively, con- 
sisting of components with different marks. The 1-syllabled, 2-syllabled, ..., words corre- 
spond perhaps to l-atomic, 2-atomic, ..., molecules. They can exist in the text in different 
states of order, they can be distributed at random or only possess ‘Nahordnung’, or 
perhaps be ordered over a wider range in a superstructure. By this means the binding 
character of language will be shown by the extent of a superstructure such as this, which in 
turn can be ascertained by estimating the correlation between the components at different 
positions as well as by determining the average distances—that is, the mean paths in the 
linear fluid or the average lattice constant in the linear crystal. 

Further, the order-disorder relations can be comprehended quantitatively by an 
appropriately defined entropy-function. 


5. COMPUTATION OF STYLE-CHARACTERISTICS 


(5-1). The syllable measures 


First of all we determine the distribution of syllable numbers, that is to say, the relative 
frequencies p; of the i-syllabled words. Thus, if A; denotes the number of the i-syllabled 


words and N the total number of words, 
-* (1) 
jee 


By counting, the values shown in Table 1 were obtained. 











es 


WILHELM Fucks 125 
Table 1. Percentage distribution of i-syllabled words 

















100 x p, (%) 

Author Work i=1 4+=2 4+=3 +=4 +=5 i=6 
Shakespeare Othello 78-81 15-11 4-95 1-16 — — 
Galsworthy Swan Song I 75°18 17-55 5:58 1-39 — — 

Swan Song II 75-54 17-40 5-93 1-13 — — 
Forsyte Saga 73°44 18-36 6-06 1-48 — — 
Huxley Brave New World 69-82 19-64 6-96 2-43 — — 
Antic Hay 68-68 19-02 7-93 2-61 -— — 
Rilke Cornet 62-34 30-30 5-99 1-34 — —_ 
Carossa Geheimnisse des 50°13 31-94 11-99 4-35 1-17 0-35 
reifen Lebens 
Hesse Steppenwolf: | §1-81 31-27 11-22 4-29 1-08 0-26 
Mann Buddenbrooks 48-60 33-00 12-20 4:79 1-14 0-27 
Zauberberg 52-91 29-00 11-62 4-38 1-37 0°37 
Jaspers Der philosophische 50-54 25-04 12-40 7-45 3-24 0-79 
Glaube 






































It can be seen from this table that an increase in the percentage of many-syllabled words 
is associated with an increase in the prose character of the text. This phenomenon will affect 
the following numerical characteristics, all of which are computed from the p,. 

The average syllable number per word is defined by 

v v 


Values of 7 are given in Table 2. 


Table 2. Average syllable number per word 








Author Work z Author Work a 
Shakespeare Othello 1-287 Rilke Cernet 1-464 
Galsworthy Swan Song I 1-326 Carossa Geheimnisse 1-732 

Swan Song II 1-338 Hesse Steppenwolf 1-721 
Forsyte Saga 1-342 Mann Buddenbrooks 1-807 
Huxley Brave New World 1-397 Zauberberg 1-723 
Antic Hay 1-409 Jaspers Der philos. Glaube 1-885 


























By this means we ensure, in contrast to the relative frequencies, that a particular work 
of a particular author is associated with a single characteristic number. Furthermore, this 
average syllable number per word is peculiar to the author. 


(5-2). The average distances 


The average distance /;, between an i-syllabled and a k-syllabled word is defined by the 
mean value of the differences of position numbers between an i-syllabled and the next 
k-syllabled word. 











126 . On mathematical analysis of style 


For example, the matrix of mean paths for Rilkw’s Cornet reads: 


1-605 1-081 1-485 1-594 
1-065 3-422 3-113 3-704 
1-476 2-933 16:83 14-38 
1515 4111 13-72 77-13 


ly = 


Since we see that the diagonal elements show mutually the most relevant differences we 
shall use them only, not the whole matrix. Moreover, this brings considerable advantage 
in computation of the average distances, because of the following simple relation for the 
relative frequencies p,. 


The syllable number 7 is a function, i = f(n), of the word position n. The distance between 
two succeeding i-syllabled words is defined by 


S(n+lP) = f(n) = it. (3) 
Further fin + 1D +12 +...) =f(n) =i. (4) 


If N is the total number of words and A; the number of i-syllabled words contained in them, 
if n, is the number of words before the first i-syllabled, and n, the number of words after 
the last i-syllabled word (up to NV), then 


Mg tI + 1D + tee +Uf-D +n, = N-1. (5) 


The mean path is defined by 





5, 1 41 N—-(1+n,+n,) 
as A,-1 4 he = A,;-1 
| re 
a - for N +o and p; = fixed. (6) 
Therefore (isdn = = . (7) 


i 


This relation makes it possible to compute directly from the relative frequencies the mean 
distances between equal-syllabled words, that is to say, the diagonal elements of the 
l,,-matrices. Table 3 shows the values obtained. 

These results have the advantage of exhibiting, in contrast to the relative frequencies, 
the characteristic differences in the distribution of the poly-syllabled words. 

Hence we can deduce another numerical characteristic, the ‘trace’ or ‘spur’ of the 
L;,,-matrix 


8 = Dl,;. (8) 


We can only make comparisons as far as we have calculated the average distances. Thus we 
carry out the summation in all cases up to i = 4 (see Table 4). 

This trace, which is introduced to get again a single characteristic from the sequence of 
numerical characteristics of the mean paths, is also a very specific one, for its values for 
the different works of one author lie close together. 














re 
16 


) 




























































































WILHELM Fucks 127 
Table 3. Average distances between equal-syllabled words 
Rilke Carossa Hesse Mann Jaspers 
Cornet Geheimnisse | Steppenwolf | Buddenbrooks| Zauberberg | Phil. Glaube 
d, 1-604 1-995 1-930 2-058 1-890 1-979 
22 3-300 3-130 3-198 2-897 3-449 3°994 
33 19-28 8-340 8-914 8-189 8-604 8-063 
“A 74-10 23-01 23-34 20-90 22-83 13-43 
-” — 85-40 92-42 87-95 73-21 30-86 
Shakespeare Galsworthy Huxley 
Othello Swan S.I | Swan 8. II Forsyte S. Brave N. W.| Antic Hay 
Ly 1-269 1-330 1-324 1-362 1-432 1-456 
Ls 6-617 5-698 5-747 5-447 5-094 5-259 
‘. 20-21 17-94 16-86 16-51 14-38 12-61 
Lug 79-55 71-84 70-13 67°75 41-14 38-39 
Table 4. Trace of the mean-path matrices 
Author Work Spur s Author Work Spur s 
Shakespeare Othello 107-65 Rilke Cornet 99-09 
Galsworthy Swan Song I 96-80 Carossa Geheinnisse 36-48 
Swan Song II 94-06 Hesse Steppenwolf 37-38 
Forsyte Saga 91-07 Mann Buddenbrooks 34-04 
Huxley Brave New World 62-04 Zauberbery 36-78 
Antic Hay 57-72 Jaspers Der Philos. Glaube 27-47 
(5:3). Entropy 


An especially characteristic measure for the order-disorder relations in a text should be 
the statistical entropy. It is defined by 


= —k> p; log p,, (9) 


the p; being relative frequencies and k an arbitrary constant. In our thermodynamic 
conception of text as a mixture of components i, with the frequencies p;, S just represents 
the ‘mixture entropy’. This entropy expression holds for independent text elements, 
i.e. systems ‘without interrelation’. If we wish to consider the interrelations, we have to 


f 
mre S=-kYEp,logpy,ete., (10) 
ik 


where p,, denotes the relative frequencies of pairs: i, k. 





128 


On mathematical analysis of style 


With the values of Table 1, the entropy values given in Table 5 are computed (S was 


calculated with the aid of Briggs’s logarithms). 


Table 5. Entropy values 























Author Work Entropy S| Author Work Entropy S 
Shakespeare Othello 0-2940 Rilke Cornet 0-3836 
Galsworthy Swan Song I 0-3215 Carossa Geheimnisse 0-4783 

Swan Song II 0:3233 Hesse Steppenwolf 0-4710 
Forsyte Saga 0-3344 Mann Buddenbrooks 0-4864 
Huxley Brave New World 0-3675 Zauberberg 0-4703 
Antic Hay 03777 Jaspers Der Philos. Glaube 0-4968 














We see that the entropy values are most characteristic of the different authors and 
moreover describe quantitatively the order, or binding, relations. In the case of English 
authors, the entropy values increase in the sequence 


Shakespeare (0-29)—Galsworthy (0-32-0-33)—Huxley (0-37-0-38), 


according to the transition from the poetic form (higher order) to prose (lower order) ; in the 
case of German authors they increase in the sequence 


Rilke (0-38)—Hesse (0-47)—Mann (0-47-0-49)—Carossa (0-48)—Jaspers (0-50). 


Further, we can make the general statement that authors writing in German are to be 
found on the whole at higher entropy values than authors writing in English. This means, 
if we are allowed to generalize from our scarce material, that the German language has, 
on the whole, a lower order-character than the English language. 

Entropy, therefore, fulfills all requirements necessary to a numerical characteristic of 
style: it is specific, conceptually clear and easy to compute. 


(5-4). A characteristic diagram 


We now plot entropy S against the trace s of the mean-path-matrix (Fig. 1) and get 
a representation in which every author corresponds to a certain area. The authors of the 
same language are situated on a certain curve track or curve strip of the (S,s) diagram. This 
suggests that we get a group of curves or strips with language as a parameter. 


6. SUMMARY 


The analysis of style, based on the distribution of syllable numbers in the text and on the 
numerical characteristics deducible from it, yields the result that every work of every 
author in every language can be associated with a sequence of numbers which characterize 
certain structural properties of the texts. This characterization is, of course, only possible 
with respect to the relations of order and coherence of the text elements (syllables), while 
the ‘Sinngehalt’ of the text, which is not accessible to such a simple description, will not 
be covered. Nevertheless, the possibility of a quantitative classification which is very 
simple to realize has been shown. This yields propositions not only about the peculiarities 
of style structure of a certain author, but also about the ‘carrying structure’ of the language 
on which the individual structure of the author is impressed. 





— 





Hot pa ete ——o~“ 








sh 


he 











WILHELM FUCKS 129 


The analogy with respect to the atomistic structure of the solid, liquid and gaseous states 
which we have repeatedly emphasized for the sake of clearness, is based on the common 
statistical nature of the underlying states and processes. 





























0s Jaspers 
faz. orooca 
AxZ. esse 
Mann = 
A 
| 
“04 . Ss NR ke 
> —~ @ Huxley {0 
= A.H. © 
- B.N.W. Galsworthy 
oeess.! 
$$. NO@SS. 
03 a 
Shakespeare 
02 ! 
0 50 100 
Trace, s —> 


Fig. 1. The characteristic diagram. 


The work done on mathematical analysis of style and the collection of material in this 
respect made by the author go back over a considerable time; the book by Yule (1944) 
was made known to the author by a footnote in Nature (1949, 163, 688) after most of the 
material of this paper had been collected. 


The author is indebted to his assistant Dipl.-Phys. Wilhelm Frahn for valuable help in 
condensing his ideas and translating this paper into English and to cand. phys. Herta 
Weidemann, cand. phys. Helmut Weymann and Emmy Heutz for valuable help in providing 
the material to establish the frequency distributions. 


REFERENCES 


BENSE, Max (1949). Konturen einer Geistesgeschichte der Mathematik, 2. Hamburg: Claassen & Govets. 

SuHannon, C, E, (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27, 379. 

YuLeg, G. U. (1939). On sentence-length as a statistical characteristic of style in prose: with applications 
to two cases of disputed authorship. Biometrika, 30, 363. 

Yute, G. U. (1944). The Statistical Study of Literary Vocabulary. Cambridge University Press. 


Biometrika 39 9 











[ 130 ] 


COMPARISON OF TWO APPROXIMATIONS TO THE DISTRIBUTION 
OF THE RANGE IN SMALL SAMPLES FROM 
NORMAL POPULATIONS 


By E. $8. PEARSON 
University College, London 


1. INTRODUCTION 


Recently two approximations have been suggested for the distribution of the range in 
random samples from a normal population, both of which make it possible to use this 
statistic as an estimator of the standard deviation c, in place of the usual root-mean-square 
estimator s, in a number of standard tests. The first of these approximations was published 
by Cox (1949) and the second by Patnaik (1950); while the former makes use of the distri- 
bution of x?, the latter makes use of that of y. Writing w for the range in a sample of size n 
from a normal population with standard deviation o, and using comparable notations, 
Patnaik puts wlo = ¢,x/Vr4 (1) 
while Cox puts w/o = CyX?/Ve; (2) 
where c, and c, are scale factors and vy, and v, are the ‘equivalent’ (fractional) degrees of 
freedom for x (or x?), all four constants being functions of n. The values of these constants 
are determined so that the first two moments of w/a and x (or x?) agree. 

While the value of v, is dbtained readily from the relation v, = 2/V2, where V, is the 


coefficient of variation of w in samples of n, the value of v, has to be obtained by inversion 
of the equation 


V2 = do,{P hy, /P 4, + )P-1. (3) 
Florin (1950) has given the approximate formula 
vy = AN4 0-25 — gd + GieA?..., (4) 


where A = 2V?. It will be seen that, as n increases, 
vo~ 4y,—1. 

An implication of the approximation (1) is that w/c, may be treated as the usual root- 
mean-square estimator s, if assigned appropriate degrees of freedom v,. These, as one would 
expect, are always rather less than n—1. It follows that w/c, may be used in place of s in 
any test involving a ‘Studentized’ ratio, the penalty being a reduction in the degrees of 
freedom with which the error standard deviation is estimated. Cox’s approximation does 
not lend itself to the same procedure since the two sides of equation (2) are of different 
dimensions, nor can the same interpretation be given to y,. However, in the sequential 
problems with which he was concerned this was not of any consequence. 

Both Cox and Patnaik showed that their approximations would be of special value in 
estimating o? or a from the mean range ®, in, say, independent samples of n observations.* 
If Cox’s approximation is used, the mean range will be distributed, apart from a scale factor, 
as x? with kv, degrees of freedom. Patnaik could not make use of the additive property of 

* If the samples are of unequal size, the y*-approximation is straightforward; H. A. David (1951, 


p. 403) has also dealt with the y-approximation in this case and has given a table (loc. cit. p. 409) to 
assist in computing the weighting factors to be used in combining the sample ranges. 


























— ET 





E. 8. PEARson 131 


x?, but assumed that w/o was distributed as c, x/,/v,, where v, may be obtained approxi- 
mately from equation (4) with A = 2V2/k. 
The object of the present note is to compare these two approximations. 


2. COMPARISON OF PROBABILITY INTEGRALS FOR THE CASE OF A SINGLE SAMPLE 


The most searching test of the approximations occurs in the case of a single range, i.e. when 
k = 1. The true probability integral of range for n = 2(1)20 has been tabled by Pearson 
& Hartley (1942). From these tables values of w/o were selected as near as possible to the 


Table 1. Probability integral approximations 

































































n=4 n=6 
X-approx. x?-approx. X-approx. x?-approx. 
Jo True Jo True 

” P.I. * P.I. 

P.i. Error P.I. Error P.I. Error P.I. Error 
0:35 | 0-0053 | 0-0058 | + 5 | 0-0012 | — 41 | 0-75 | 0-0050 | 0-0060 | +10 | 00018 | — 32 
0-45 | 0-0111 | 0-0119 | + 8 | 0-0036 | — 75 | 0-90 | 0-0117 | 0-0134 | +17 | 0-0058 | — 59 
0-75 | 0-0483 | 0-0499 | +16 | 0-0306 | —177 | 1-25 | 0-0495 | 0-0526 | +31 | 0-0383 | —112 
1-00 | 0-1057 | 0-1074 | +17 | 0-0877 | —180 | 1-50 | 0-1031 | 0-1061 | +30 | 0-0925 | —106 
1:30 | 0-2054 | 0-2065 | +11 | 0-1973 | — 81 | 1-80 | 0-2000 | 0-2012 | +12 | 0-1969 |} — 31 
2:00 | 0-5096 | 0-5079 | —17 | 0-5303 | +207 | 2-45 | 0-4899 | 0-4858 | —41 | 0-5046 | + 147 
2-80 | 0-8045 | 0-8031 | —14 | 0-8153 | +108 | 3-25 | 0-8053 | 0-8030 | —23 | 0-8119 | + 66 
3-25 | 0-9016 | 0-9013 | — 3 | 0-9021 | + 5 | 3-65 | 0-8981 | 0-8982 | + 1 | 0-8978 | — 3 
3-65 | 0-9516 | 0-9520 | + 4 | 0-9470 | — 46 | 4:05 | 0-9519 | 0-9533 | +14 | 0-9481 | — 38 
4-40 | 0-9899 | 0-9904 | + 5 | 0-9849 | — 50 | 4:75 | 0-9898 | 0-9910 | +12 | 0-9862 | — 36 
4-70 | 0-9951 | 0-9955 | + 4 | 0-9911 | — 40 | 5-05 | 0-9952 | 0-9960 | + 8 | 0-9925 | — 27 

n=10 n=15 
X-approx. x?-approx. X-approx. x?-approx. 
True True 

w/o P.I. wie P.I. 

P.l. Error P.I. Error P.I. Error P.l. Error 
1-35 | 0-0054 | 0-0076 | +22 | 0-0034 | —20 | 1-80 | 0-0049 | 0-0077 | + 28); 0-0041 | — 8 
1-50 | 0-0117 | 0-0149 | +32 | 0-0085 | —32 | 1-95 | 0-0108 | 0-0150 | + 42) 0-0096 | -—12 
1-85 | 0-0479 | 0-0530 | +51 | 0-0425 | —54 | 2°30) 0-0468 | 0-0535 | + 67) 0-0451 | -—17 
2-10 | 0-1015 | 0-1061 | +46 | 0-0971 —44 | 2-55 | 0-1026 | 0-1083 | + 57] 0-1015 | —11 
2-40 | 0-2025 | 0-2034 | + 9 | 0-2022 | — 3 | 2°85 | 0-2103 | 0-2105 | + 2] 02109) + 6 
3:00 | 0-4878 | 0-4800 | —78 | 0-4954 +76 3-40 | 0-4885 | 0-4775 | —110| 0-4909 +24 
3-75 | 0-8062 | 0-8026 | —36 | 0-8089 | +27 | 4:10 | 0-8036 | 0-7990 | — 46] 0-8043 | + 7 
4:15 | 0-9038 | 0-:9047 | + 9 | 0-:9030 | — 8 | 4:45 | 0-8964 | 0-8975 | + 11] 0-8962 | — 2 
4-45 | 0-9474 | 0-9500 | +26 | 0-9453 | —21 | 4:80 | 0-9505 | 0-9543 | + 38] 0-9498 | — 7 
5-15 | 0-9898 | 0-9919 | +21 | 0-9880 | —18 | 5-45 | 0-9900 | 0-9928 | + 28) 0:9895 | — 5 
5-40 | 0-9948 | 0-9962 | +14 | 0-9934 | —14 | 5-70 |°0-9950 | 0-9968 | + 18) 0-:9946| — 4 





















































132 Comparison of two approximations to the distribution of range 


median and the lower and upper 20, 10, 5, 1 and 0-5 % points for samples with n = 4, 
6, 10 and 15. These 11 values of w/o and the corresponding true probability integral (P.1.) 
are shown in Table 1 for the four values of n. Using the appropriate scale factors ¢,,c, 
and degrees of freedom v,, v2,* it was then possible to compute the corresponding 
probability integrals for both the x and the x? approximations, with the help of the Tables 
of the Incomplete Gamma Function (K. Pearson, 1922). In certain cases, near the start of 
the curve where interpolation in the Tables is difficult, the probability integral was calculated 
from a few terms of the series 


P(y*| v) = aa me /P(d-+j +1), 


where m = }y?,d = 4v. The resulting approximate probability integrals are shown in 
Table 1 as well as the differences 


Error = Approximate P.1.— True P.1. 
It will be seen that, up to n = 15: 

(a) The errors are of opposite sign, the y-approximation having a distribution less skew 
and the x?-approximation a distribution more skew than the true distribution. 

(6) In both cases, the error is greater at the lower than at the upper. tail of the curves. 

(c) Broadly speaking, the approximations are of equal accuracy at m = 10. For n< 10 
the y-approximation and for n > 10 the x?-approximation is the more accurate. 

No simple criterion can be given to discriminate between an approximation which is 
adequate and one which is not. What may be quite satisfactory in a simple test of significance 
may lead to more serious error in the distribution of derived statistics. In §4 below, mention 
is made of the possible use of the ratio of maximum to minimum range in a set of k 
independent ranges as a test for heterogeneity of variance. Here, the form of the range 
distribution in the tails becomes of increasing importance as k increases, and before the test 
referred to is based on either of these approximations, some investigation into the effect of 
the ‘errors’ given in Table 1 is needed. 


; 3. THE /,, 8, DIAGRAM 


Both approximate distributions have the correct mean and variance, whether a single 
range or a mean range is used. Their accuracy will, therefore, depend on the extent to which 
they have the correct ‘shape’. This may be usefully examined by calculating the moment 
we By = i= 15H and Py = yo+3 = wylph, 

and plotting the resulting £,, , points for the true and approximate distributions. Four 
main series of points are shown in Fig. 1: 

(a) Those for the distribution of x (or s). These points lie on a curve which sweeps round 
to the Normal point (0, 3) as the degrees of freedom v increase. The values of £,(s) and £,(s) 
were taken from Tables for Statisticians and Biometricians, Part II, Table XVII. 

(6) Those for the distribution of x? (or s?). These lie on the type III line 28, —3f,—6 = 0, 
and converge on the Normal point as v increases much more slowly than the corresponding 
points for y. 

(c) Those for the distribution of the range w. It is the peculiar character of this curve, 


* Numerical values of the constants have been given by Cox (1949), Patnaik (1950) and (for ¢,,,) 


by David (1951). As greater accuracy was needed, however, values were recomputed from the appro- 
priate formulae. 








in 





~~ -— 





E. S. Prarson 133 


which first swings in and then turns away from the Normal point as 7 increases, that makes 
approximation by a single type of curve difficult. The values of £,(w) and £,(w) were taken 
from the table given by Hartley & Pearson (1951). 

(d) Those for the distribution of log, x? (or log, s?). The ,, 8, values were obtained from 
the asymptotic expressions given by Bartlett & Kendall (1946), which are correct to the 
accuracy needed for v > 20. While this distribution is not strictly relevant to the present 
inquiry, it seemed of interest to include a plot of the points.* It will be noticed that while, 
for a given v, the distribution of log, s? is more nearly normal than that of s*, it is con- 
siderably further from the normal than the distribution of s. The log,s® distribution is 
negatively skew, i.e. y, = /f,<0, and in this sense the positive skewness of the s* dis- 
tribution may be said to have been overcorrected by the transformation. The great merit 
of the logarithmic transformation is, of course, that it gives a statistic whose variance is 
independent of o°. 

Without further aid we cannot interpret a diagram like Fig. 1, since ‘distances’ in different 
directions and in different parts of the field are not in constant proportion to differences in 
probability integrals. Experience has, however, shown that the great majority of sampling 
distributions commonly met are well represented by curves of the Pearson system having 
the same /,, £, values. This means that Fig. 1 may be interpreted with the help of Pearson 
& Merrington’s (1951) tables of the upper and lower 5 and 0:5 % of this system, expressed 
in standard measure.} A brief extract from these tables is given in Table 2 below. This shows 
that within the region of the field with which we are concerned, changes in /, are of much 
more importance than changes in /,. 


Table 2. Standardized deviates, (x—x)/o for curves of the Pearson system. 
(Positive skewness, i.e. [ts > 0) 



































B A 0-00 0-05 0-15 0-30 0-00 0-05 0-15 0-30 
2 
Lower 5% points Upper 5% points 
3-0 1-64 1-58 1-52 1-44 1-64 1-71 1-76 1-82 
3-2 1-64 1-58 1-53 1-46 1-64 1-70 1-75 1-80 
pB4 1-64 1-58 1-53 1-47 1-64 1-69 1-74 1-79 
Lower 0-5 % points Upper 0-5 % points 
3-0 2-58 2-33 2-11 1-84 2-58 2-76 2-86 2-93 
3-2 2-65 2-42 2-20 1-95 2-65 2-83 2-93 3-01 
3-4 2-71 2-48 2-28 2-04 2-71 2-88 2-99 3:07 





























* These points lic very close to the Type V line of K. Pearson’s system and also not far from the line 
representing log-normal distributions, but it is not clear that this correspondence can be turned to any 
useful account. 

+ For example, to the accuracy with which interpolation in these tables is possible, we obtain from 
them correct values of the 5 and 0-5 % points of the s and w distributions represented in Fig. 1 except 
for small errors in the lower 0-5 % points for s when v = 3 and for w when n = 4. 





134 Comparison of two approximations to the distribution of range 

















Scale of By 
C00 002 004 006 008 010 012 014 046 018 020 022 024 0:26 028 0:30 
2 St SP ee Oe ie a ee ae a ees a os an Be i 
60 30 20 45 
3-00C 12 3-00 
~ 
3-02 } 302 
3-04 3-04 
3-06 3:06 
3-08 3-08 
3-10 3-10 
312 312 
q 3-14 3-14 
rf 
a 
S 316 3-16 
3-18 7 + 318 
get? points for t 
(with ie "Aige ¢, 4 - 
3-20 Ple sing “trig, 320 
*a) tig 
3-22 + 3-22 
o. 
3-24 e 4 3:24 
%, @ 
% e 7 
326 3 @20 + 3-26 
a v a 
328 
66 _ 3-28 
C coe we ee es ee eee ee Pen 
000 002 004 006 008 O10 O12 014 016 O18 020 022 0:24 026 028 0:30 
Scale of B, 


Fig. 1. Diagram showing relation between /3,, /, points of various sampling distributions. 


Table 3. Comparison of approximations to the distribution of the range 











x?-approx. 
X-approx. 
n VY 
Ve A By 
4 2-93 10-95 0-731 4:10 
6 4-68 17-86 0-448 3-67 
10 7-68 29-82 0-268 3°40 
15 10-77 42-16 0-189 3-28 





























Scale of 2 

















ae 








E. S. PEarson 135 


Consider now the distribution of range in a single sample. As an illustration, Table 3 gives 
for n = 4, 6, 10 and 15 the degrees of freedom v, and v, of the two approximations and the 
#, 8, values of the x?-approximation (as the first three of these fall outside Fig. 1). For 
n = 4, 6 and 10 the f,, 2, points of the x-approximation are marked by triangles in Fig. 1; 
these have been joined by broken lines to the corresponding true points for range. The 
effect of the divergence of the beta-curves for w and y is clear. After good agreement when 
n = 4, there is a gradual deterioration in fit as m increases, but this does not become 
appreciable until n reaches, say, 7 or 8. Even at n = 10, Table 1 shows fair agreement in the 
probability integrals. 

For the x?-approximation, the £,, £, points are far outside the diagram for n = 4 and 6, 
so that the distribution is much too skew. At x = 10 the points for the two approximations 
are at about equal distances away from the true point, but in opposite directions. When 
n = 15 the true and x?-approximation points are still approaching one another; at n = 20, 
with v, = 52-54, they have passed and at greater sample sizes will continue to diverge. 

We may next consider very briefly the relation of the approximations to the distribution 
of &, the mean of k ranges. The true /,, £, point of @ will lie on the straight line joining the 
Normal point (0,3) to the point £,(w), £,(w), such that £,(w) = £,(w)/k. Fig. 1 shows the 
points for the mean range in & = 5 samples of (a) = 6 and (b) nm = 15 observations. For 
these cases the equivalent degrees of freedom of the two approximations are: 


v, for x-approx. v, for y?-approx. 
5 samples of 6 22-6 89-3 
5 samples of 15 52-9 210-8 


Although no detailed calculations have been made, it appears that the y-approximation 
will be the more accurate for n = 6 and the x?-approximation for n = 15. As k increases, 
all distributions for mean range are, of course, becoming nearer and nearer to the normal, 
and any differences will becme progressively of less account. 


4. A TEST FOR HETEROGENEITY OF VARIANCE 


Hartley (1950) has suggested that if a number & of independent variance estimates 
87 (t = 1, 2,...,&) are available, each based on the same number of degrees of freedom p, then 
a short-cut test of the hypothesis that the variances in the k sampled populations (assumed 
normal) are the same, may be made using the ratio s*(max.)/s*(min.) of the maximum to 
the minimum values of the & estimates. He provided an approximate table of the upper 
5 % points of this ratio for k = 2(1) 12 and v = 2(1) 10, 12, 15, 20, 30, 60,00. It is hoped to 
publish shortly an accurate table of both 5 % and 1 % points of s?(max.)/s?(min.).* 
Clearly, to the accuracy involved in the approximations, we may use an even quicker 
test of variance heterogeneity by noting the maximum and minimum ranges among the 
k groups of observations, finding the ratio w(max.)/w(min.) and then using Hartley’s table: 
(a) with v, degrees of freedom and the square of the range ratio for the y-approximation; 
(b) with v, degrees of freedom and the range ratio itself for the y*-approximation. 
No calculation of the scale factors c, or c, is required. The method needs further 
investigation, but it might be supposed that more accurate results would be obtained 
with the x-approximation when n < 10 and with the x*-approximation when n > 10. 


* This table has now been computed by Mr H. A. David. 











136 Comparison of two approximations to the distribution of range 


Illustration 


Suppose that we have 5 samples of 10 observations. What is the 5 % level for the ratio 
w(max.)/w(min.) if the samples have been drawn independently from normal populations 
having a common variance? 

(a) x-approximation. v, = 7:68. Interpolating in Hartley’s table (as adjusted by David), 
it is found that for k = 5, v = 7-68 the 5 % limit for s*(max.)/s?(min.) is 8-59. The square 
root of this value, or 2-93, gives the approximation to the 5 % limit for the range ratio. 

(6) x?-approximation. v, = 29-82. Interpolation in Hartley’s adjusted table gives, 
directly, 2-79 as the 5% point for the ratio. 

The difference between 2-93 and 2-79 no doubt lies in the fact, brought out in Table 1, 
that the probability integral at the tails of the y and x?-approximations differ in opposite 
senses from the true probability integral. At a guess, the true 5 % point for w(max.)/w(min.) 
when k = 5, n = 10, may lie midway between the two approximations, at about 2-86. 

While a test based on w(max.)/w(min.) will in general be less powerful than the well- 
known test for heterogeneity of variance (Neyman & Pearson (1931), Bartlett (1937)), there 
is clearly much to be said for a short-cut test based on this range ratio. If tables are available, 
its application in a preliminary survey of data is immediate. It also conforms with the 
intuitive method of judging whether there is heterogeneity by comparing the range of 
observations in the group having greatest spread with that in the group having least 
spread. 


REFERENCES 


BarRTLetTT, M. 8. (1937). Proc. Roy. Soc. A, 160, 268. 

Barttett, M. 8. & KENDALL, D. G. (1946). J.R. Statist. Soc. Suppl. 8, 128. 
Cox, D. R. (1949). J.R. Statist. Soc. B, 11, 101. 

Davin, H. A. (1951). Biometrika, 38, 393. 

Fiorin, H. (1950). Commun. Roy. Flem. Acad. (Science Series), 12, 6. 
Hart ey, H. O. (1950). Biometrika, 37, 308. 

Harttey, H. O. & Pearson, E. 8. (1951). Biometrika, 38, 463. 

NEeyMAN, J. & Pearson, E. S. (1931). Bull. Int. Acad. Cracovie, A, 460. 
Patnalk, P. B. (1950). Biometrika, 37, 78. 

Pearson, E. S. & Hartiey, H. O. (1942). Biometrika, 32, 302. 

Pearson, E. 8S. & MERRINGTON, MAXINE (1951). Biometrika, 38, 4. 
Pearson, K. (1922). Tables of the Incomplete Gamma Function. Cambridge University Press. 























[ 137 ] 


THE COVERING CIRCLE OF A SAMPLE FROM A CIRCULAR 
NORMAL DISTRIBUTION 


By H. E. DANIELS 
Statistical Laboratory, University of Cambridge 


1. INTRODUCTION 
Consider a sample of x observations from the circular normal distribution 
dxdy 


dF = e-@*+v*)/20* 
no? 


The covering circle of the sample is defined to be the smallest circle in the x, y plane 
containing on or within it every sample point. The purpose of this paper is to study the 
distribution of the radius and centre of the covering circle. They are bivariate analogues of 
the half-range and mid-range point of a univariate normal sample. 

My attention was drawn to the problem by Mr D. Fraser of the Psychological Laboratory, 
Cambridge, who used the radius of the covering circle as a convenient rapid measure of 
dispersion. 


2. SOME PRELIMINARY FORMULAE 


We shall require some known results concerning a simpler problem of the same kind (see, 
for example, Quenouille, 1949). 

The chance that sample points lie within a circle of radius r centred at a given 
distance p from the true mean is P(r, p), where* 


sell "ede sp 
Pirsp) = ere [Dewar (2), (2-1) 
I,(z) being the Bessel function of imaginary argument and order v, which is given by 


1 Qn 
Iz) = on e°©°89 cos vOd0, 
0 


when »v is an integer. Since [5(z) = 1,(z), 


eP(r,p)_—s pp ~pryagt ("87S sone 7 (SP 
—" —aPlr.p)t+e . i “=e 1(¥), 


whence, integrating by parts, using the fact that 


£ @h(2)} = he) 


, oP(r, p) an rp 
we find “— = Toa *+p*/20* 1(2), (2-2) 
and P(r,p) = Seon | e~w*/ao* 1(5) du. (2-3) 
o ° o 


* This function has been discussed in classified reports by W. R. Hynd and the author in Britain 
and A. N. Lowan in the U.S.A. The function 1—P has been extensively tabulated by the Rand 
Corporation. 











138 The covering circle of a sample from a circular normal distribution 


Also P(r,p) = 1—e-P*/20" I * 208 ¢-stnet (%) 
= 1] — e+ p*y/208 1(74) —P(p,7r). (2:4) 
Further partial integration gives the expansion 
1 — P(r, p) = ene? +0%20# {2() +f 1,() +S 1) + a : (2:5) 
and similarly, from (2-1), 
P(r,p) = crete (74) +5 h() + | (2-6) 


Both series converge for all r and p, their sum yielding the familiar Laurent development 


r? + p®*yjo? ns r\™ P 
enwnet= EE) tal’): 


3. JOINT DISTRIBUTION OF THE RADIUS AND CENTRE OF THE COVERING CIRCLE 
Samples may be divided into two main categories, for which the covering circle has 
respectively the following properties: 

(i) The circle passes through three of the points forming an acute-angled triangle, and 
the remainder of the points lie inside the circle. 

(ii) The circle passes through two diametrically opposite points, the remainder lying 
inside the circle. 

These two contingencies are mutually exclusive, and the remaining cases where the 
circle passes through more than three points can be shown to be relatively of negligible 
probability. The required joint distribution is therefore obtained by adding the probabilities 
for cases (i) and (ii), which we now evaluate. 


For samples of type (i) the chance that the radius lies between r and r +dr and that the 
centre is at distance p to p + dp from the origin is 


dF, = 4n(n—1) (n—2)dQ, P*-(r, p), (3-1) 


where dQ, is the chance that three ‘labelled’ points lie in a specified order round the circle, 
and P(r,p) is given by (2-1). The factor n(n — 1) (n— 2) is the number of ways of selecting 
the three points, and the divisor 3 accounts for the identity of orders 123, 231, 312 on the 
circle. To find dQ,, let the three points have coordinates X;,Y;(j = 1, 2,3) and write 


x,=pcosd+rcosh;, y; = psing+rsin3§,, 
where p, ¢ are the polar co-ordinates of the centre, and 0 < 6, < 0, < 0, < 27. The Jacobian is 


O(2,, Y1,%q, Yo, Xz, Ys) 





d(p.1. 6, 0,, 95, 94) = rp {sin (0, —4,) + sin (8, — 0.) + sin (6, —93)}, (3-2) 
and the points will always form an acute-angled triangle if 
0<6, < 2n, 
6, <0, <6,+7, (3-3) 


6,+7<6,<6,4+7. 


re 





Lo i 





4) 


6) 


nt 


2) 


3) 


oe 


ee 





H. E. DANIELS 139 


The probability element for the three points transforms to 


— 3(7? +p*) | pdrd 
exo | “a (onat p| -25 {00s (8 - — $) + cos (0, — $) + cos (3 — -#)| 


x {sin (0, —9,) + sin (05 — 4,) + sin (0, — 6,)}dpd0,d0,d0,, 





and dQ, is obtained on integrating the 6’s over (3-3) and ¢ from 0 to 27. 

Since, by symmetry, integration of the 6’s must give a result independent of ¢, the 
¢ integration merely introduces a factor 27. The @ integrations can be effected by putting 
¢ = 0, expanding sin (6,—6,)+sin (6,—6,)+sin(@,—6,) and integrating the six terms 
separately. After some manipulation it is found that 


dQ, = 5x - SP |r Zz ‘E) rdrdp, (3-4) 


202 
For samples of type (ii) the chance of r to r+ dr, p to p+dp, is by a similar argument 
= 4n(n—1)dQ, P"-*(r, p), (3:5) 


where dQ, is the chance that two labelled points are at diametrically opposite positions on 
the circle. 
The two points may be given co-ordinates 


a, =peosd+reos#, y, =psind+rsin§g, 
% = peosd—reosh, y,=psing—rsin#, 
where 0 <6 < 27,0<¢ < 27, and since 


O(y, Yr, Le» Yo) 


= = 4rp, 
O(p,.r, ¢, 4) P 


it follows without difficulty that 
dQ. = = ser ‘+6 No" ro drdp. (3-6) 


The joint distribution of r and p is therefore 


F,(r,p) = dF, +dF, = eae 2) exp] ~saa02+09 [en ( L) Pe 3 (r, p) 


2n(n — 1) (7? + p?) BS ‘ 
+t exp| -2P" | oP *(r,p)} drdp, 


rdr 0 


dF (r,p) = —n(n— eS — 2 ap —eia® Pu—-2(r, p)} dp. (3-7) 





which reduces, by (2-2), to the form 


4. DISTRIBUTION OF THE RADIUS 


On integrating p from 0 to o0 the distribution of r is found to be 


. , d 
dF, (r) = n(n —1) e-7? (1 — e-PR0*)n-2 F. (4:1) 
This may be contrasted with 
rdr 


df P(r, 0)} = nee? (| — e-t20*)n-1 =e 











140 The covering circle of o sample from a circular normal distribution 


which is the distribution of the radius of the smallest circle, centred at the true mean, 
enclosing all sample points. On the other hand, it is remarkable that (4-1) is the same as the 
distribution of the radius of the smallest circle, centred at the true mean, enclosing all but 
one of the sample points. 

The distribution function for r can be written 


F,(r) = n(1 — e-*Pe*)n—-1 — (m— 1) (1— een, (4-2) 


or it may be computed from tables of the Incomplete B-function (Pearson, 1934) as 
I,(2,n—1), with z = e-** in the notation of the tables quoted. Upper and lower confidence 
limits for 7, given r, can also be found from the relation 


= = (2log {}(n — 1) F+ 1})t, (4-3) 


where F has Fisher’s variance ratio distribution with 4 and 2(n—1) degrees of freedom 
(see Table 1). 


Table 1. Distribution of rjo. Mean, variance and percentage points 








n Mean Variance 1% 5% 95% 99% 
2 0-886 0-2146 0-100 0-226 1-731 2-146 
3 1-211 0-1990 0-348 0-539 2-000 2-380 
4 1-409 0-1827 0-541 0-756 2-157 2-518 
5 1-548 0-1700 0-709 0-916 2-268 2-616 
6 1-655 0-1601 0-835 1-041 2-352 2-691 
7 1-742 0-1522 0-939 1-142 2-421 2-752 
8 1-814 0-1456 1-027 1-228 2-478 2-803 
9 1-876 0-1401 1-103 1-301 2-527 2-847 

10 1-929 0-1354 1-170 1-365 2-570 2-886 

ll 1-977 0-1314 1-229 1-421 2-608 2-920 

12 2-020 0-1278 1-282 1-472 2-642 2-951 

13 2-058 0-1245 1-330 1-517 2-673 2-979 

14 2-093 0-1214 1-374 1-559 2-702 3-005 

15 2-125 0-1183 1-414 1-597 2-728 3-028 

20 2-255 0-1095 1-576 1-750 2-833 3-124 

30 2-427 0-0977 1-790 1-953 2-975 3-254 

40 2-543 0-0906 1-932 2-087 3-071 3-342 
50 2-629 0-0855 2-037 2-187 3-143 3-408 
100 2-881 0-0731 2-341 2-477 3-358 3-608 





























The moment generating function for r? is 


Pin+1)T(2—2to2) 1 


= T(n+1—2to?) ——aug(1—2to®/m)’ 





(4-4) 


and the even moments of r about the origin, 3, = E(r?*), are obtained by the usual relations 
from the cumulants of r?, : 

> 2 

k,(r?) = (7-1)! 2i07) Y — 


ee 4-5 
m=2 m! ( 





—_ 









2) 


6 & 


3) 





4) 


ns 








H. E. DANTELS 


: ‘ 1 1 1 
= x.{y2) = 26% — 4+— = 
In particular bg = K,(r?) = 20° ( gtgt--t *) 


~ 20*(logn—1+/¥) (4-6) 


for large n, where y is Euler’s constant 0-577216.... 
The following two formulae are useful for moments of any order. 
From (4:1), is sil rd 
- _)m — 1) e—mrtiz02 TOF 
aF,(r) > (") y"m(m — 1)e = 


= (=) Arfm(m— 1) emg A, 
A operating on m. Hence 
i, = Er?) = (—)"26T (hs + 1) o*A"yo, (4-7) 
where yp = 0, ¥; = 9, Yn, = (m—1)/m** (m> 2). This formula is convenient for calculating 


y,/o* for small n, but becomes inaccurate when n exceeds about 10. 
For large n there is the asymptotic expansion 


fg = Efr? — wy + ug}** 





(4:8) 


= ite Bold 1) hom + 1) inl) 


m! Me™ ° 


where /t,,(r2) is the mth moment of r? about its mean, calculated from the known cumulants 
(4:5) in the ordinary way. But since ~~ 0 (logn),n has to be fairly large for reasonable 
accuracy in calculating the odd moments. 

Table 1 gives the mean and variance of r/o for a range of values of n. It was computed by 
using (4-7) and (4-8) at each end of the range and by numerical integration for the inter- 
mediate values. Note that 2r is always greater than the corresponding mean range for 
a univariate normal sample of n, as is otherwise evident from the fact that the diameter of 
the covering circle is the greatest of the ranges of all univariate projections of the sample, 
and hence its expectation must be greater than that of the mean range of all such projections. 

When n is large, (4-8) and (4-5) give 


E(r) = py~pyi~o J (2logn), 
Lege) (5-sims 
4 ps 12 2/logn’ 





varr = fly — ty? 


The effective range of r is therefore such that e~"** = O(1/n), and (4-1) has the limiting 


form & d : 
aF0)~nexp| - Foner elt +0(7)} ‘ (4-9) 


so that ne~”?” is approximately x? with 4 degrees of freedom. For very large n, (4-9) in 
turn approximates to the general form given by Gumbel (1936) for a quantile, but the 
neglected terms are then O(1/log 7). 








142 The covering circle of a sample from a circular normal distribution 


5. DIsTRIBUTION OF THE CENTRE 
The distribution function for the distance p of the centre from the true mean is, from (3-7), 


F,(p) = 1—n(n— 1) ee" z entiet Pr-r(y, p) (5-1) 


This does not appear to reduce further in the general case. For n = 3, however, it takes 
the following simple form: 
a. ye rdrf{"sds _., ‘sp 
F;(p) = 1—6e-6*) I. ey S| ore eam 1,(22) 


= 1—3¢-9r*20° | : e~3r?/20% 1o(“4) he (by parts) 


a e~4p*/30° | (5:2) 


which may be compared with the distribution function 1 —e-**?* for the centroid of the 
three points. 
06 


05 


04 


— Ss . . +. + e eo. ee 





Ce ae ee a es ee ae ee ee ee ee ae 
0 10 40 50 


Fig. 1. 





When n is large it was shown that r~ O(,/logn). It will appear that if we assume also 
that p~O(1/,/logn), a limiting distribution function is obtained which ranges from 0 to 1 
so that no other values of p need be considered. With r and p of the orders stated, we 
have from (2-5), 


Pir,p)~1— essere p(B) 4 0(2 evan) 


~ | —e-" 20" J 


rp 1 
0(4) bs of. log ;) ‘ 
and so (5-1) becomes 


win (7 exp] 2 ner 1,2) ]l140()) 
F,(p)~1 nf exp | a lie) 16(7) |\!+(ioga) 








Write w=ne"?"*, R=  J(2log n). 
s\ 4 
Then Pn R(1- 8") 
o log n 





anes 


Ri 
1o(4) = 1(R)-1,(R) fer ~ + 
Rw} 
wlog.w ) 


e- I (rplo*) — e-wl {Rj + I(R) 2logn i : 


\ 





2) 
he 


SO 
1 
Ve 





H. E. DANIELS 143 





© 1 
a —wI(R) aiileniaes 
Hence F,(p)~1 [. we w+ 0(— ;) 
ili : (ca) 5-3 
~ !~ Tiipie J2logn)) * ?\logn) it 
which is the limiting distribution function. The frequency function for R is 


_21,(R) 
ae 





Fig. 1 shows the form of f(R), which behaves like R for small R, and 47. R? e-*? when Ris large. 


I am indebted to Mr D. A. East for the calculations of Table 1. 


REFERENCES 


GuMBEL, E. J. (1936). Ann. Inst. Poincaré, 5, 115. 
Pearson, K. (1934). T'ables of the.Incomplete B-function. London: Biometrika Office. 
QUENOUILLE, M. H. (1949). Proc. Edinb. Math. Soc., ser. 2, 8, 95. 








[ 144 ] 


THE FREQUENCY JUSTIFICATION OF CERTAIN 
SEQUENTIAL TESTS 


By G. A. BARNARD, Imperial College, London 


1. The primary object of the following paper is to give a ‘frequency justification’, in the 
sense of Neyman and Pearson, of a form of sequential t-test proposed by Barnard (1949). 
When we have done this, we go on to show that the result can be generalized, so that any 
sequential test arrived at by the argument used in the earlier paper will have a ‘frequency 
justification’. Finally, it is pointed out that the discussion has a bearing on the problem of 
‘reconciling’ the three main theories of statistical inference, represented nowadays by 
Professors Fisher, Jeffreys, and Neyman and Pearson. 


2. Toexplain the meanings of the words we have put in inverted commas, we will consider 
first the case of testing two simple hypotheses #;(j = 1,2) against each other. If EZ, 
represents the result of n independent trials, to say that the hypotheses are simple means 
that Pr{E, | #4} can be calculated as a definite number for each possible result and for 
each of the two hypotheses. For a fixed hypothesis, for varying results, Pr {Z,, | #4} 
represents a (direct) probability (or a probability density), while for a fixed result, for 
different hypotheses, Pr {E,, | #;} represents a likelihood. According to Fisher, statistical 
inferences in cases such as this find their natural expression in terms of likelihood, so that 
the relevant quantity, if the result Z,, has been obtained, is the likelihood ratio 


A, = Pr{E,, | 4}/Pr{E,, | 74}. (1) 


If A,, is greater than |, #%, is more likely (on Z,,) than %, and the magnitude of A,, measures 
how much more likely it is; in the converse situation, a similar interpretation is given to 
1/A,,. According to Jeffreys, on the other hand, statistical inferences are to be expressed in 
terms of (posterior) probability, and in a case such as this one he would say that the ratio 
of posterior probabilities, or the posterior odds for #, against %, would be given by 


Pr{ 4%, | E,}/Pr{ 4, | E,} = An-Aw 
where Ay = Pr{%}/Pr{ 4} 


is the ratio of the prior probabilities of # and #. In the absence of previous knowledge 
about the two hypotheses, Ay would be taken to be 1, so that the posterior odds would be 
mathematically the same as the likelihood ratio—the difference between the two would 
lie in their interpretation only. Finally, according to Neyman and Pearson, the situation 
could be interpreted in terms of their ‘probabilities of error of the first and second kinds’, 
a and f, which are direct probabilities. If it were desired to lay down a discriminating rule 
for deciding between % and #, on the basis of a definite experimental procedure, the result 
E,, would be regarded as one of a set of possible results which might have been obtained with 
this procedure; this totality of possible results would then be divided into two sets, C;, such 














G. A. BARNARD 145 


that if the result obtained belonged to C; the hypothesis #, would be ‘accepted’. The risks 
of error would then be 


a= Pr{H,€C,| 4%} and f£ = Pr{H#,eC,| 4%}. 


Now Wald’s sequential test procedure is specified by saying that observations are to be 
taken in stages, and at each stage the likelihood ratio A,, is to be calculated, and if 


a’/(1—B") <A, <(lL-2')/f", (2) 


then observations are to continue, while they are to stop as soon as either of these two 
inequalities is broken. Wald has shown that if this test procedure is used, and if the class 
(', is taken as those results for which the right-hand inequality is broken, while C, consists 
of those results for which the left-hand inequality is broken, and if «’ and f’ are small, then 
ey ee a=a and f=/'. (3) 
Now if, in fact, we have a result for which the likelihood ratio is A,, (> 1), we may choose to 
regard it as a terminal result of a sequential procedure in which the limits were set by 


A, = (1 —a')/p", 


and then we could, using Wald’s result, regard A, as expressing the ratio of two direct 
probabilities, the probability of rightly accepting “%, and the probability of wrongly 
accepting #4, A converse interpretation would hold if A,, were less than 1. Thus in the case 
of testing two simple hypotheses against each other the three theories of inference can be 
‘reconciled’ in the sense that all three can be taken as saying that the relevant quantity to 
calculate is A, and all three agree that the larger this quantity is (assuming it greater 
than 1), the more confident we can be that we would be right to accept #4. They differ 
only in that one interprets A,, as a ‘likelihood ratio’, another says that A, is a ratio of 
posterior probabilities, while the third interprets A,, as a ratio of direct probabilities; and 
this difference comes close to being a merely verbal one. Our object in this paper is to extend 
this area of ‘agreement’ between the three theories of inference to cover some of the cases 
where the two hypotheses being tested against each other are composite. In particular, 
when we say that a ‘likelihood ratio’ A,, has a ‘frequency justification’, we mean that if 
it is used as a basis fora Wald sequential procedure (2), then the approximate equalities (3) 
hold, for the risks of error. 

3. The ‘sequential /-test’ problem we now consider is that of testing against each other 
the composite hypotheses: 

HM ;(j = 1,2): X is normally distributed with standard deviation o and mean 


d;0(0<0<@). 


Ifx;(i = 1, 2,...,”) represent the results of n independent observations on X, the likelihood 
function of the observations depends only on ¢ and s, defined by 


nz. = Dx; (n—-1)8? = Y(x;-2.)?, t= 2.,/(n)/s. (4) 


Now if -%;is true about X, it is also true about Y = aX, for anya > 0; and if the observations 
are multiplied by a, s becomes as, while t remains unaltered. Thus the likelihood ratio A,, 
depends on ¢ only, and is given by : 

An = P(t | 6,)/P(t | 42); (5) 


Biometrika 39 10 











146 The frequency justification of certain sequential tests 


where ¢(t|6) is the probability density function of the non-central ¢ distribution with 
parameter 6 and with (n— 1) degrees of freedom: 


p(t |S) = K,, : u"—lexp — 4{(n— 1) u? + (ut—8./n)*} du 
0 


=K,{"fulust| )du, say, (6) 


+0 
K,, being a constant, depending on n, such that d(t| 6)dt = 1.* We propose now to 


give a frequency justification of this likelihood ratio, by showing that if the sequential test 
procedure (2) is followed, using A,, defined by (5), then the approximate equalities (3) hold 
for the risks of error a, f, associated with the composite hypotheses %,. 

We notice first that « and f are certainly functions of «’ and /’, 


a= a(c’, 2’), B = B(x’, B’), 
and they do not depend on o. This is so because the criterion A,,, used in carrying out the 
test, is unaltered when the observations 2; are replaced by y; = ax;, for any positive a, 
i.e. when X is replaced by Y = aX and when ais replaced by ao; to every test on X there is 
associated a test on Y, leading to the same conclusion, and vice versa, so that the risks of 
error for X and for Y must be the same. 

It follows that in calculating the risks of error we can, if we like, give a particular value 
to 7 in %,, say, for example, ¢ = 1; or, alternatively, we can give o a distribution of values. 
The risks of error will be the same in all cases, since they do not depend on o. In fact, what 
we shall do is to give o a distribution of values, in a certain extended sense. We now consider 
the hypotheses: 

(a,b): X isnormally distributed with standard deviation o and mean 6,0, anda <o <b, 
with - 

Pr{a,<a<a,}= tf dala, where k = logb/a. 
aw 


These are simple hypotheses, in that we can calculate the probability function for the 
first n observations a; (i = 1,2,...,”) as a definite number: 


b 
Pr {E,, | 4(a,b)} = Kf HL (/L (27) o)) exp — 3(a;—-3,0)?/07} dojo IT d;. 


Using the relations (4), making the substitution w = s/7,du/u = —do/co in the integral and 
cancelling the common factors, we find for the likelihood ratio 


A, (a,b) = Pr{E, | H(a,b)}/Pr {E,, | Aa, b)} 


sh sb 
[ flv By) du | ” f(s t| Ba) du, (7) 


Pu 


an expression which would be equal to A,, ifa were zero and b were infinite, s being finite. Now 
I ,(u,t| 8) <(wexp— gu?)"-!, 


so that as a—0 and b-+® the integrals converge absolutely and uniformly in ¢ and n for 
n>2. Further, by taking S, small enough and S, large enough, we can make Pr {S, << S,} 


* For a practical method of approximating to this likelihood ratio, and for further practical aspects 
of the test, we refer to Rushton (1950). 





vith 


(6) 
v to 


test 
10ld 


the 


and 


(7) 
Now 
v for 


< S,} 


pects 





G. A. BARNARD 147 


as near to 1 as we please for any given o. Hence, by taking a small enough and b large 
enough we can, with probability as near as we please to 1, make A,,(a, ) differ from A,, by 
as little as we please, uniformly in n and t, for a fixed value of ¢. Now from Wald’s result 
referred to in paragraph 2, if «”(o,a,6) and £"(a,a,6) are the risks of error using the ratio 
A,,(@, 6) with the test procedure (2) when the standard deviation is 7, we know that 


b 
(1/log ofa) | a"(o,a,b)do/o = a’, approximately, 


b 
and (1/log bla) | B"(o,a,b)do|o = f’, approximately. 


But the convergence of A,(a, 6) to A,,, together with the fact that the limits of the 
inequalities (2) are continuous functions of «’ and £’ implies that, as a—> 0 and boo, 


a"(7,a,b)>a and f"(c,a,b)>£, 
and so we must have, approximately, 
a=’ and f= fp’. 
This completes the frequency justification of the likelihood ratio (5). 


4. We now go on to generalize the argument of the preceding paragraph. Suppose we 

have two composite hypotheses 
H;: X has the probability function (x | y;,0), EQ, 
und suppose that 

(a) Q can be regarded as a group of transformations on the space of X. 

For example, in the special case of §3, the standard deviation is a positive real number, 
and the set of positive real numbers forms a group under the operation of multiplication; 
the space of X, in this case, is the real axis, and the group of positive real numbers under 
multiplication can be regarded as a group of transformations of the real axis into itself— 
viz. the ‘magnification group’, corresponding to ckanges in the scale of measurement. Next 
we suppose that ] 

(b) If X has the probability function ¢(w | y, 4), and Y = @’X, with 6’e€ Q, then Y has the 
probability function ¢(y | y, 4’). 

In the case of §3, if X is normally distributed with parameters dando, then Y = 0X is 
normally distributed with parameters 6 and o’o. The next supposition is that 

(c) For any v independent observations x; (¢ = 1, 2,...,”) there exist a pair of statistics 

Uy, 


n =, v,(%, Pere 


which are jointly sufficient for the parameters y, 4, so that 


II P(x; | Y> A) ral Si (ns On | Y> A) Gly, rire. Ly), (8) 


= U,.(%1,-.-, Vp) 


v 


and which are such that, for any 0’€ Q, 

0 (O's, ...,0H,,) me O(a, 0005 Be) 
and v,(0'x,, ...,02,) = V0, (2p, ...5 Fn) 
In §3, u, = ¢ and v, = 8, as defined by (4). , 


10-2 








148 ‘The frequency justification of certain sequential tests 


Fourthly, we suppose that 

(d) Q is a locally compact topological group, containing a sequence of compact sets 
Q,¢Q,¢...¢Q,¢... such —~ Q, = Q. 

In the case of §3, the fact that the product of two positive numbers is a continuous 
function of either of them, shows that the positive real numbers under multiplication form 
a topological group, and that this group is locally compact is equivalent to the Heine-Borel 
theorem. A sequence of sets such as Q, is obtained by taking Q, as the closed interval 
(€,, 1/e,,), where €, is the kth element of a sequence of positive numbers which decrease to 0. 
For definitions of the terms used in assumption (d) we refer to Weil (1940), where a concise 
and elegant account of the theory to be used subsequently in this paper is also given. 

Now by a theorem due to A. Haar, if Q is a locally compact topological group we can 
define a Borel family § of subsets of Q, containing all the compact sets of Q, and a measure 
“(S8), defined for all Se 9, such that for any 0€Q, 


H(S) = n(98), 


where 9S is the set of images under @ of the elements of S. The measure so defined is called 
a (left-) invariant Haar measure, and it is unique up to a numerical constant if we exclude 
trivial cases (e.g. « identically zero). By means of 4 we can define an integral, for every 
function f(?) continuous for Je S, with the property 


| f(A)du = I f(0'0)dn for any 0’€Q. 
des 0€0'S 


In the case of §3, the Haar measure of an interval (a,b) on the positive real axis is 


b 
f(a, b) = | fl 


a & 


To verify the invariance of this measure, we noticed that the set c(a,b) is the interval 
(ca, cb), and 


cb ° 
pe(ca, cb) = i we 


ca x 
b 
= | . putting y = a/c,dy = dz/c 


= p(a,b). 


It is apparent that this particular invariant measure was used to obtain the ‘prior dis- 
tribution’ of a involved in the specification of the hypotheses 4;(a, b) in §3. 

We now give the generalization of this part of the argument. 

Consider the pair of simple hypotheses: 

Hk): X has the probability function ¢(x | y;,),0€Q,, with Pr {JS} = p(S8)/p(Q,), 
for Sc Q,. 

Using E,, to denote the n independent observations z;, we shall have, using (8), and 
omitting a possible factor [] dz;,, 

i 


Pr{E, | H(k)} oh _ SrlUns Un | Vj) Only, very Uy) dpe/pu(Qy), 
k 








‘val 


dlis- 


Ds 


and 


— 





G. A. BARNARD 149 
so that the likelihood ratio is 


Fultns Vn | Yup 0; dp 
A, (k) = ~fs0 


| Siltns Vy | Yo 0) dy 
OED, 





We now have to assume that the function f,, decreases sufficiently rapidly outside Q,. 
Specifically, we assume that: 
(e) As ko, A, (k) >A,,, where 


| Si(Uns Pn | 19) de 
_ JI€EQ 


A, sai ? 
| LulUns%n | V2, 9) dp 
0€Q 





and the convergence is uniform in uw, and in n, at least for sufficiently large. 
To complete the generalization we have to show that this A,, is the ratio of the probability 
function of wu, on the hypothesis % to the probability function of uw, on the hypothesis %. 
To find the probability function of u, we have to integrate the probability function 
II $(x; | y, 0) of the sample point Z,, = (2, ...,2,) over the region of the sample space where 


u,, = constant. This will give the probability function apart, possibly, from a function of 
u, only which will represent the volume element between two neighbouring regions of the 
sample space corresponding to uw, and u, +du,; this function of w, only will cancel when 
we take the likelihood ratio, so that it can be ignored. Now let us denote by U(S) the region 
of sample space where uw, = U and where v, €S; note that S is a subset of Q, since from 
condition (c) it follows that the space of v,, is Q. Now for a fixed value of U, the integral 


{ Yn(X1, +++» €,) I] ax; 
U(S) i 


as a function of S, defines a measure in Q, and we can therefore set it equal to 


| m(U,v,,) dp. 
meENS 
Now using (8) we have 


Pr {EZ,,€ U(S)} = [ LAT, | y,0)m(U, v,,) dy. (9) 


J MES 
But we also have, from condition (b), if we use 0, F,, to denote the sample point (0, 2, ...,0,2,), 


Pr {0, E,,€ U(O,8)} = | LAU, v, | ¥,9,9) m(u, v,,) dpe 


MEAS 


= [Lal U. 0.24 | 76 ,0)m(U, 0, %e)dp, (10) 
trneS 


by the invariant property of the measure . The two probabilities (9) and (10), since they 
correspond to the same event, must be equal. If we remember that the set S is arbitrary, 
and compare the integrands in (9) and (10), we see that we must have (at least almost 
everywhere, with respect to 2) 


SAU, O,2,, | y,9,0)m(U, O,v,) =f, (U, 2, | y,9)m(U,v,), 








150 The frequency justification of certain sequential tests 


which shows that, for fixed U and y, f,(U,v, | y,0)m(U,v,) is a function of 6—v,, only. 
Hence the probability function of u, which is, apart from the function of u, already noted as 
irrelevant, 


i) Falta» | 7,0) (21,0, dy 
,€2 


is equal to et Fr(Uns%n ly, 0) m(u,,v,) dp, 
€ 


and when we take the ratio of the two probability functions the factor m(u,,, v,,) will cancel, 
so that we obtain A,, as given in (e). This completes the generalization of the argument of §3. 

As to the conditions (a)-(e), it may be remarked that the essential ones are (a), (b) and (c). 
Situations for which the conditions (d) and (e) are not satisfied may, in theory, occur, but 
they would be pathological, and the writer has been unable to think of practical situations 
where the conditions are not satisfied. It should be noted that we have not assumed that 
X is a scalar variate, so that the argument would apply, for example, to the problem of 
testing, in a bivariate normal population, whether the correlation coefficient was p, or pp, 
the other parameters being unknown; the likelihood ratio would in this case be the ratio 
of the probability density function of the sample correlation coefficient r on the hypothesis 
p = p, to the same thing on the hypothesis p = py. 


5. We now turn to the consideration of the point of view of Jeffreys. First, we may note 
that the likelihood ratio A, of §3 would represent, for Jeffreys, the ratio of posterior 
probabilities of the corresponding hypotheses, since he takes the prior distribution of o, 
when nothing is known about it, to be given by the probability element do/a. So that, in 
this respect, the argument of §2 extends to §3. But this is not all. Jeffreys has not, so far 
as the writer is aware, laid down general principles which determine the choice of prior 
distributions in cases of ‘ignorance’; but the argument he uses to justify the choice of da/a 
strongly suggests that he would, in all cases like that considered in §4, where the conditions 
(a) and (b) are satisfied, choose as his prior distribution the appropriate invariant Haar 
measure. If we assume that this is so, we can say that the argument of §2 extends in this 
way to the general case of §4. 


After the author had developed the argument of §3, while he was working on the 
generalization of §4, he had a conversation with Charles Stein, from which it became clear 
that Stein had already worked out the essential features of the generalization, though his 
mode of presenting the argument differs from mine. Stein is to publish his work, which is 
along rather different general lines from mine. In spite of our difference of outlook, our 
conversation was most helpful to me. 


REFERENCES 


BarnarpD, G, A, (1949). Statistical inference. J.R. Statist. Soc., Series B, 11, 115. 

RusuTon, S. (1950). On a sequential t-test. Biometrika, 37, 326. 

Welt, A. (1940). L’ Integration dans les Groupes Topologiques et ses Applications. Paris: Hermann 
et Cie. (Actualites Scientifiques et Industrielles.) 





las 


ann 





[ 151 ] 


EXPERIMENTAL DESIGNS FOR SERIALLY 
CORRELATED OBSERVATIONS 


By R. M. WILLIAMS 
Department of Scientific and Industrial Research, Wellington, N.Z. 


1. IntTRODUCTION 


In many fields of experiment there is evidence of positive or, more rarely, negative correla- 
tion between the errors in adjacent observations; e.g. in yields from plots in a field trial or 
in the observations on a sequence of articles produced in some industrial process. The most 
common method (due to R. A. Fisher) of coping with this correlation is to neutralize it by 
using randomized experimental designs, e.g. randomized blocks or Latin squares. These 
designs permit of a very simple method of analysis of the observations (analysis of variance). 
But there are other possible methods, less convenient but perhaps more sensitive, which 
are worth investigating. 

In experiments where any effect such as fertility trend is likely to change fairly smoothly 
from one plot to the next, and the plots are arranged in a one-dimensional sequence, we can 
use the systematic designs and the methods of analysis developed by Hald (1948), in which 
it is assumed either that the fertility trend can be represented by a polynomial or that it is 
nearly linear, so that it is adequately estimated at any point by a moving average centred 
on it, the residual errors themselves being uncorrelated. 

In this paper we investigate systematic experimental designs which may be used when 
the observations form a one-dimensional sequence, and we may assume that the errors are 
correlated in a stationary linear autoregressive process (without trend). Errors of this type 
may occur in various types of experiment, but for convenience we shall use the terminology 
of field experiments throughout this paper. 

The p treatments are denoved by 1, 2,3,...,p, and the corresponding treatment effects 
by 4, g, ...,@,. The observations, arranged linearly, are denoted by 


Yoo Yrs +++ Uns 
and we assume that they may be represented by 
. Yi = T+ QU, (1) 
where 2; is the error component and a, is the effect of the treatment applied to the ith plot. 
The notation [i] will be used generally to denote the number of the treatment applied to 
the ith plot. 


In the next section we develop the form of design and analysis when the errors are in the 
form of a normal stationary Markoff process, i.e. 


Uy = PXj_1+€;, (2) 


where | p| < 1 and ¢, is normally distributed with zero mean and variance o*, independently 
of Vip Xj-2 ete. 
In $3 these results are extended to the second-order case, where 


4, +d,x;_,+dy2;_, = €;. (3) 











152 Experimental designs for serially correlated observations 


2. THE DESIGN AND ANALYSIS FOR THE FIRST-ORDER CASE 


2-1. For variables satisfying equation (2) we have var (2;) = o?/(1—?), and the correlation 
coefficient of any pair of variables (x;, x;,,) is p’. The joint distribution of x9, x,, ..., X, is 


—_" = 4. 2 i, \ | axed, ...ds 
(<3) (1 =F" ) exp| 5 do? tote + (1 +p ) :y (x7 )-—2p pd (xt) | LyX, ... AX). 
(4)* 


For any given design we may obtain the maximum-likelihood equations for a, dg, .. 


Gan» o*,p 
from equation (4) in the form 


Aa 


? n-1 
om = (n+ nf E (eta) 2'Y “(e9)| | e+28 + (14/2) = (at) - 29" a 








A 1-2 n—1 a*—! e 
o* = el ae . 1 x (%;%j44)-P ~ ai). : 
Ag oL Ao =! A 
— = (l+p ) 2 2 (Yi 4) — é xX (¥i-— Ga) = 9, (7) 
Ca, litl]=r 
2 OL a9 A A y A a@ A 
s— =(1+p?) Y (Yi -Go—-P SY (Yi — Ga) -—P?(Yo— Ga) = 9, (8a) 
CAjg} {i)=[0] {é+1]=[0) 
a2 eL no 7 ”) ao 4 
—o?—— = (1+/*) > j(4e— Gon) — .. XS (Ys — Ua) —PPRYot+ Yn — 24\q)) = 9. (8b) 
CAi9) {i]= +1]=[0] 


Here L is the logarithm of the likelihood. The sum  (y,) denotes the sum over all plots 

lij=r 
receiving r, with the convention that if for any i, [i+ 1] = [¢—1] = r, the corresponding y; 
is included in the sum twice. We use d,, etc., to denote the maximum-likelihood estimators, 
but for compactness we use 2,;, etc., instead of (y; — d,;,), where, as in (5) and (6), no confusion 
is likely to result. Equation (7) holds for all r other than [0] and [n]; if [0] +[n] then (8a) 
gives the form for r = [0] and a similar form holds for r = [n]; if [0] = [n], both these are 
replaced by (80). 

We may choose the design in any form which simplifies the analysis. In order to reduce 
the effect of long trends we shall consider only designs composed of m blocks each containing 
each of the p treatments once, together with, perhaps, one or more extra plots at either end. 
It. will be convenient if the variance of the estimate of treatment effect is the same for all 
treatments. Two designs that suggest themselves are 


I (a) (1,2, 3, ..., p) (1, 2, 3, ..., p) (1,2, 3, ..., p) (1,2, ...)..., 
and 
T(b) (1,2,3,...,p)(p,p—1,...,3,2, 1) (1,2,3,...,p)(p, p—1,...).... 


Here each block is enclosed in a bracket for ease of reading. The brackets have no bearing 
whatever on the conduct of the experiment, which is arranged as a single sequence of mp 
plots with no break between blocks. 

Design I (a) is susceptible to linear trends, and in this respect I (b) is better, but both are 
unsatisfactory because the variance of the estimates of treatment differences depends 


* Equations (4), (5) and (6) have been given by Koopmans (1942). 





on 


)* 





R. M. WrILiiamMs 153 


strongly on the particular treatments. If y,, is the yield from the plot in the uth block 
receiving the rth treatment, the solutions of equations (7) and (8) for I (a) are given by 


1 - a, m—1 2 (p\*} 
Bs wis p = _ 
1) (Yn v9) | | (- = ) (“)"] , 
1[(..m-1 p a, m—1 2 (p\*)] 
ea: » ao on = jen p ak pee 
C2 mal (7 m ) (Yi Y.1) + Ymn v») | | (A m y (2 > 


If m is large a, tends to y_,, and the variance of the estimate of treatment differences is 
given by 


ee rng OP fy _ PON) nn peer 
var Y.-Y.0) = acpeaa| 1 (P ee ae er) +p*"—2) ]. 


a, = Yi Cp" — Cop)”, 








1 p a. m— 
ge oy Yaa Ya) + (7 or 











For design J (b) we have 





i (Yur + Ym — 2y. 1) (p" > preti—) 

a,=Y rt 1] -—m—2 F 

m(1 + 2j—- pr 2) 
m m 


Again, if m is large, @, tends to y_, and 


20? ¢ 9 9 9 9 
wee 8.0" Sa [: =F gee + ee gee ne 


pr 21 —pm a +p* prv-e-er} +p -trtrte-t 
$—e—— | 1 ~/— i- tu 
l-p* m |l—p? 2 2 


A more useful design is found by exploiting the fact that the partial correlation between 
non-adjacent pairs of plots is zero. To do this we require 

(A) that each treatment shall occur equally often (say c times) adjacent to every other 
treatment, or 

(B) that each treatment shall occur equally often adjacent to every treatment including 
itself. 

Such designs will be called II (a) and II (b) respectively. For example, 


2(1234) (2314) (3142) and 1(1234) (4132) (2413) (3421) 


satisfy conditions A and B respectively. The construction of these designs is discussed 
in §2-3, 

In order to ensure that condition A or B is fulfilled, an additional plot receiving the 
treatment applied to the last plot must be placed at the beginning of the set of blocks. 
(This could be avoided if the arrangement of the plots was circular, but this is not likely to 
be possible in practice.) This has the further advantage that since the treatments applied 
to plots o and n are the same, the asymmetry due to end-effects is confined to one treatment 
and equation (8b) is appropriate. The condition A (or B) ensures that the family of 
equations given by (7) and (8b) is symmetrical in all the a,’s except that the coefficient of 
M4) in (86) is too large by an amount (1—p?). The remaining asymmetry in (8b) may be 
climinated by assuming that the variance of the observation on the first plot is 77h, where 
h is large, instead of (o?/1—*), but that the distribution of (x;—pz;_,) for i = 1,2,... is 
otherwise unchanged.* The effect of this is to reduce the weight given to the first plot by 








* A similar device is used by Quenouille (1949); the writer is grateful for the opportunity to see that 
paper before its publication. 











154 Experimental designs for serially correlated observations 


a factor p?. If the correlation is high this is negligible, and if very small it is equivalent to 

—_— this observation altogether. Equation (86) is now replaced by 

(l+/ py Mat tet He ~ ie + & (Y)-Yo-IYn - mary |- 2 x (yi-4&)=9, (9) 
+1 Atm li+1)=10) 

and, provided we adopt the convention that the weighted sum in square brackets is to 

replace  (y;—) (which would have m+ 1 terms), we can write this in the same form 





= 








fi} -{0) 
as equation (7), and we have, for condition A, 
aay] > (y; ma, |-A[ x ¥ tyd-e(S (a )-4) | ae (10) 
fij=r 
and for condition B, 
(1 +e] » iyi)—md, |—al > AZ )-e% (a | = 0. (11) 
lij=r {it1]= 
The solution of equation (10) is 
iy 
(l+/?) > (y)-P XY (Y)+ oo 
4, = ise aies (12) 
A9 =p 
a 72 4 
m( +p > ) 


where, by the relations in §2°3, 
» p oo 
7? UY) + Tn —Yo) = m &(4,). 


Since y is independent of r it disappears when we estimate treatment differences. 


| 
If p> 0, d,>—-¥ (yi) 
Miij=r 
lp-1 
and if pI, 4?" = w-3 5 wo | +. 
mop lij=r ~fittler mp 


The value of f can be obtained by successive approximations from equations (5) and (12), 
and 4? from equation (6). If p is known the maximum-likelihood estimates @, are normally 
distributed and their variance matrix is given exactly by 


y= | -2(-84 
éa,¢ca, } \ 


In practice p and 6? are both subject to sampling errors, but for large n their distributions 
are approximately normal, and the variances of these estimates are therefore given to order 
I/n by inverting the matrix 

















» » 
am +p?) ey ne ep 0 0 | 
2m 2m 
_-— + p*)) du. ie 
p-1? m(1+p*) aay 0 0 
| 2m 2m 
shes ht as. m 2 ( 
=| sP og m(1+ p*) ) 0], 
2 
0 0 Sh. 0 .. 
1—p* 
n 
0 0 0 0 —— 
L 20°? | 











0) 


ler 








R. M. Writi1aMs 155 











which leads to var (4,) == a CEST : e > i | 
o 
ov (4,4) = nr Cage 2p)(1—p)’ 
var (4,—4,) = = a = Viti 
(+04 :) 1 (13) 

in JASE, 
var (47) = sae 

cov (6, ?) = cov (fd,) = cov (674,) = 0. J 





For design IT (4) we have from equation (11) 


> =! bog 2py 
(l+p*) & Y-P Xt YW)+—— 
as {iJ=r lit =r Pp 











L > t 14 
r re +/) i 
Here, if p> 0, eal x (yi) 
Mfij=r 
and if p>1, d hy x Ww 5, tia ~., (Yi )|+2 mp" 
tg  F milt+p?)—(p— lpe 
Also re a (1 +p?) (an(1 +p?) — ppe)’ 
pre, pe 5 
cov (4,.4,) = m (1 +p) (m1 +p?) —ppe)’( (15) 
eet 2a? “ 
var (4,—4,) ~m( 1+ p?) = Firth): 





Cov (4, /), cov (4, 67), cov (fe"), var (/) and var (6?) are the same as for type II (a). 
2-2. The efficiency of designs II (a) and I] (6). If, instead of a design of type II (a) or II (b) 
a randomized block design is applied to a set of observations arranged linearly in m blocks 
of p treatments, in which the variance of each observation is o? and the correlation between 
x; and 2,,, is p,, then the treatment difference is estimated by the difference of the means 
of the plots receiving those treatments, and the variance of this difference is given by 


207(1—p)/m, 


where pf is the average correlation between any two plots in the same block, taken over 
the p! possible arrangements. 


For a general stationary process 
2 pl 
_ ee . p—p m 
P= p—1) B (p— "Pp 


and in the case of the linear Markoff process where p, = p” and o% = 0?/(1 — p?), the variance is 


208 [2p eh 5 
m(1 —p*) = ern p(t-pP/ jo 








156 


The efficiency of the systematic designs compared with the randomized scheme is given 


Experimental designs for serially correlated observations 





by Ve/Vit(a and V;,/Vi; q. These are given in Tables 1 and 2 for p = 2, 3, 4, 10,00. 


Table 1. ValViiw 









































4 4 3 4 5 10 fora) 
p 

1 2-00 2-000 2-222 2-500 4-070 co 
0-9 1-90 1-854 2-010 2-208 3-202 9-526 
0-8 1-80 1-717 1-817 1-953 2-564 4-556 
0-7 1-70 1-589 1-648 1-733 2-096 2-922 
0-6 1-60 1-470 1-496 1-545 1-749 2-125 
0-5 1-50 1-361 1-363 1-387 1-492 1-667 
0-4 1-40 1-263 1-250 1-257 1-303 1-381 
0-3 1-30 1-183 1-161 1-158 1-169 1-198 
0-2 1-20 1-108 1-088 1-082 1-080 1-083 
0-1 1-10 1-043 1-029 1-025 1-020 1-020 
0 1-00 1-000 1-000 1-000 1-000 1-000 
—0-1 0-90 0-977 0-977 1-010 1-016 1-020 
— (2 0-30 0-980 1-028 1-034 1-072 1-083 
— 0:3 0:70 1-016 1-100 1-134 1-178 1-198 
—0-4 0-60 1-098 1-231 1-283 1-350 1-381 
—0°5 0°50 1-250 1-451 1-525 1-624 1-667 
— 06 0°40 1-520 1-824 1-925 2-067 2-125 
—0-7 0°30 2-019 2-496 2-632 2-842 2-922 
— 0-8 0-20 3-080 3-910 4-097 4:443 4-556 
—0-9 0-10 6-370 8-288 8-569 9-335 9-520 

-—1-0 0 fo) co co fo) co 

Table 2. Vi/Viray 
Pp 2 3 4 5 10 foe) 
P 

1 1 1-333 1-667 2-000 3667 fo.) 
0-9 0-953 1-238 1-510 1-768 2-884 9-526 
0-8 0-911 1-154 1-371 1-570 2-313 4-556 
0-7 0-876 1-081 1-255 1-404 1-898 2-922 
0-6 0-850 1-020 1-156 1-266 1-593 2-125 
0-5 0-833 0-972 1-077 1-156 1-370 1-667 
0-4 0-829 0-939 1-017 1-072 1-211 1-381 
0-3 0-838 0-927 0-981 1-018 1-102 1-198 
0-2 0-867 0-929 0-964 0-987 1-036 1-083 
0-1 0-918 0-949 0-966 0-976 0-998 1-020 
0 1-000 1-000 1-000 1-000 1-000 1-000 
—(1 1-122 1-085 1-068 1-062 1-039 1-020 
— (2 1-300 1-213 1-179 1-144 1-120 1-083 
— 03 1-557 1-401 1-347 1-315 1-255 1-198 
—0-4 1-933 1-676 1-598 1-550 1-462 1-381 
—O5 2-500 2-083 1-979 1-906 1-782 1-667 
— 06 3-400 2-720 2-584 2-470 2-201 2-125 
—(-7 4-967 3-808 3-634 3-440 3-174 2-922 
—(08 8-200 6-013 795 5-418 4-983 4-556 
-0-9 18-100 12-670 12-398 11-405 10-404 9-520 

-10 D (o') oa) (oa) oO (o's) 



































ven 








R. M. WrLLiAMs 157 


For large positive or negative correlations there is a considerable increase in efficiency 
for all values of p> 2. The curious behaviour of design II (a) for p = 2 is understandable 
when it is realized that the only such design is of the form 


2(#2) (12) (12)... 


Such a design is very inefficient for negative correlations. 
The relative efficiency of II (a) and II (b) is given by 


Virw = i + 2p 
Vira (p—1) (+p?) 
For positive correlations II (a) and for negative correlations II (6) is more efficient. The 
difference is important if p is small and |p| approaches unity. 
2-3. T'he construction of type II designs. In order to satisfy the conditions of design II (a) 
II (b), m the number of blocks, p the number of treatments and c the number of times a given 
treatment is adjacent to another, must satisfy the relations 





2m = c(p—1) for II (a) 
or 2m = cp for IT (5). 


For II (0) we have the further restriction that ¢ must be even, since if a treatment occurs 
in adjacent plots the first is adjacent to the second and the second adjacent to the first, so 
that ¢ can only change by multiples of two. 

If it is necessary to keep the design as short as possible we can put c = 1, and since for 
this, or any odd value of c, the condition on ITI (a) requires that p shall be odd, we have to 
introduce an extra treatment if p is even. If ¢ is even we may use II (a) if the correlation 
between adjacent plots is expected to be positive, IT (6) if it is negative. 

A design may be made up by taking a design for say, c = 1, and repeating this design in 
a continuous chain: or we may repeat the design but rename the treatments, randomly or 
systematically, at each repetition, subject to the restrictions that the treatments applied 
to the first and last plots should be the same in each repetition. Alternatively, we can make 
up a design for, say, p = 5, ¢ = 7, composed of one design for ¢ = 1, followed by three 
different designs for ¢ = 2, subject to the same restriction as above; or we can make a design 
in which the condition A (§ 2-1) is satisfied for the design as a whole, but not for any part of it. 
This last is laborious to construct and is not likely to be of general use, and most designs 
are likely to be made up of a sequence of basic designs for which ¢ is 1 or 2. 

If each repetition of the basic design is independent of the others (e.g. in an experiment 
on animals, if each repetition is applied to a different animal), then the permutations can 
be made without the restriction on the first and last treatments. 

No general algebraic method has been found for the construction of these designs, but 
a number of them have been constructed fairly easily by means of the diagram in Fig. 1. 
The treatments are arranged round a circle and the design traced out by joining the treat- 
ments by a continuous line so that all possible joins are made c times. 

For designs of type II (+) we can represent the passage from one plot to an adjacent 
one receiving the same treatment by one or more closed loops at each vertex according 
as c is 2, 4, ete. 

If we relax the condition that the blocks should contain each treatment once and only 
once then we can apply to this figure the theorem, due to Euler, that a figure which has no 











158 Experimental designs for serially correlated observations 


odd node can be described unicursally in a re-entrant route (see, for example, Rouse Ball’s 
Mathematical Essays and Recreations, p. 173). This shows that if c is even designs of both 
types exist for p odd or even; and such designs can be constructed by means of Trémaux’s 
rule for threading a maze (ibid. p. 183). If c is odd the only designs that exist are 
type II (a) for p odd; Trémaux’s rule does not then apply. 











4 is 
— 


Fig. 1. Construction of type II(a) design (12345) (24153) with p=5, c= 1. 


If the above condition is imposed, this rule may fail because the only join still to be made 
at some point is to a treatment which has already appeared in the block. It has, however, 
been found in practice that this situation is easily avoided. 

Examples of designs constructed in this manner for c = 1,2, p = 3, 4, 5, ..., 10, are given 
below. 

II (a) 
c=1 
(123) 
(12345) (24135) 
(1234567) (2461357) (3625147) 
(123456789) (246813579) (369471582) (591483726) 
c=2 ° 
(12 
(123) (213) 
(1234) (2314) (3142) 
(12345) (34152) (45312) (31425) 
(123456) (253641) (531642) (516324) (314562) 
(1234567) (5364721) (4251637) (1354762) (5643271) (6241573) 
(12345678) (64251387) (53627481) (73416582) (63748125) (32458671) (64138275) 
(123456789) (759864231) (473852691) (827163594) (876925143) (947563821) (354681927) 
(397162485) 
(123456789 10) (685724913 10) (28364795 10 1) (48 10 7392615) (243517698 10) (418596 10 327) 
(36475 10 9281) (938456217 10) (497861 10 253) 


II (6) 
c= 

(12) (21) 

(123) (312) (231) 

(1234) (4132) (2413) (3421) 

(12345) (53142) (25134) (45123) (35241) 

(123456) (613524) (462315) (524163) (346512) (263541) 

(1234567) (7246135) (5147362) (2571364) (4576123) (3471526) (6537241) 

(12345678) (81357246) (68251473) (36178452) (26713485) (56812374) (46153827) (75836241) 

(123456789) (913572468) (814792563) (371582694) (483957126) (613458927) (781469325) 
(591683742) (284976351) 

(123456789 10) (10 246813579) (9147 10 36258) (8271593 10 64) (4 10 16923875) (5 10 84937162) 
(21 10 9854376) (65 10 8427913) (35149286 10 7) (74 10 2596381) 








all’s 
0th 
ux’s 

are 


ade 
ver, 


ven 


327) 


62) 








R. M. WILLIAMS 159 


3. THE EXTENSION TO THE SECOND ORDER PROCESS 
The relation between the errors instead of being x, = px;_,+¢€; may be of the form 
x; + d,2;_, + d,%;_2 = €;, (16) 

where again ¢€; is normally distributed about zero with variance o?. We have the following 
well-known results: 

(i) The process is stationary if, and only if, —1<d,<1 and —(1+d,)<d,<(1+d,). 

(ii) The variance of 2; is 

o*(1 + 1S) Se o*(1 +d) 
(1—£,%) (1 —§3) (1—&3) (1d) ((1 +d)? — dj)’ 

where £,, & are the roots of £?+ £d, +d, = 0. 

(iii) The correlation p, between x; and z;,, is given by 











Pp, +dp,_-1+ep,_2 = 9, (17) 
so that Py = Si +0983, 
£,(1 — &) 
‘here = : ; ; 
— a (€1 — Se) (1 +& Ee) 
-  &l- 6) 


; * (2-83) (1+ 8189) 


Two problems now arise: 

(i) Can a design be found which will simplify the analysis as in the first order case? 

(ii) What errors will be introduced if the design is analysed on the assumption that it is 
a first order process, when in fact it is the second process? 

Some designs do exist which simplify the analysis. These are obtained by imposing 
simultaneously two conditions on the design: 

(1) That each treatment should occur equally often adjacent to every other treatment. 

(2) That each treatment should occur equally often adjacent but one to every other 
treatment. 

Such designs will be called type IIT and must fulfil the condition for II (a) that 
2m = ¢(p—1) (designs corresponding to 11(6) do not exist). The construction of such 
designs involves a good deal of trial and error, but is made easier by the simultancous use 
of two of the circular diagrams described above. 

Examples of such designs are: 


23(123) (123), 
24(1234) (2134) (1324), 
43(12345) (24153) (14532) (51243). 


In this case it is necessary to repeat the last two plots at the beginning of the design in 
order to satisfy conditions (1) and (2). If it is assumed that the variance of these two 
observations (say ho) is large compared with o?, we can eliminate the asymmetry due to 
end effects. 

If the errors in the observations are named X_,, Zp, 21, Ze, ---,%, then the distribution 
will be approximately 

n 


Fae 4(nm 42) | 2% 
7 ( ) exp | - Io? ~ (vu; Fd, x;_) +d a? | II (dz;). (18) 


Ino? 
270 -1 











160 Experimental designs for serially correlated observations 


The maximum-likelihood equations are: 





oL 1 n , n n ’ 
ad, sn - 3 [2361443 etd es py (12) | = U, 
oL 1 Ld nd ” a ¢ 
td, ae AyD (ej 1%i-2) + 42% (2j_2)+ ~ (2j_2%;) | = 9,7 (19) 
oL n+2\ 1 1s Ss 
= *) a dot > (v;,+d,%;_,+d,2;_2)? = 0. 
The general expression for 0L/¢a,(r + [n — 1], [n]) is 
OL 1 : rn . ‘ 
Fa = galll + di+dd) Y (e+ (dytdydy) E(w )+dy Y (w=, — (20) 
r liJ=r {it1J=r {t+2)=r 


and, as in the first order case, 0L/@a,,_,; and 0L/¢a;,,; can be written in this form provided 
we take appropriately weighted means for some of the terms involved. These weights can 
easily be obtained from the equations. 

With this interpretation, the maximum-likelihood estimates of a, are 














4 1 a. 4 ae meee ‘ 
=" . ace ee ee [edie =X (yt +G¢,) YL (w) 
m!1 +43 -+d3-——— (d, +d, +d, d,) Vile iet=r 
| “ p-l al | 
- “ ae a a ie RY ‘ 
+d, ¥ (Yj) —- (1 +d,+d,d.)¥ (4) |, (21) 
lit2] or ama 1 
? “ * a. 7 j2 = 92 _ j jz 
where m3 (a) ra S(y,) + 1 Yur) (dy dy + 3) + (Yo Yo di Cede G) 
1 1 (l+d,+4d,)? 


The solution of equation (19) can be obtained by successive approximation as before. 
Pp 
The term in > (4;) is required only in the estimation of d,,d,, and not directly in comparing 
1 
treatment differences. 


The variances, obtained as in §2, are (to order I/n) 


p— 2? 
ud L+d3+¢3 fo (d, +d +d, dy) 
A oa ie 
var (4,) 1. J 
( +d} +d — 





» ’ 





(d, + dy+yds)) (L+d, +d)? 








p-| 
-2 
ag. re 2 ee +d,+d,d,) 
cov (4,.4,) — - , 
(1 tMtGr assy (dy + dy +y4)) (1+d,+d,)° (22) 
Ig2 
var (4,—d,) =-— : , 


5} 
l yn Nin (d,+d,+d,d,.) 
j— 
var (d,) = var (d,) = (1-d3)/n, 
var (@*) = 204/n, 
cov (d,d,) = d,(1—d,)/n, 


cov (6*d,) = cov (4%d,) = cov (6°4,) = cov (d,4,) = cov (d,4,) = 0. 





The efficiency of these designs compared with a randomized scheme is shown in Table 3 
for p = 5, for various values of d,, dg. 


-— 








—— 








161 








R. M. WiLuiaMs 
Table 3 
dy —0-9 0-6 —03 0:3 0-6 0-9 
d, 
1-8 on patie mois a ans 0-35 
62-80 
181-50 
15 ae odie is ote 0-47 0-46 
15-68 17-35 
33-08 37-88 
1-2 sh ah ‘ide 0-62 0-64 0-61 
8-96 4-37 8-84 
14-32 6-82 14-40 
0-9 os ase _ 0-74 0-86 0-87 0-76 
6-30 2-52 2-75 8-11 
8-57 2-94 3-16 10-67 
0-6 py a 0-72 0-94 1-10 1-06 0-87 
4-88 1-82 1-64 2-14 6-86 
6-79 1-92 1-50 2-01 7-89 
0-3 ‘im 0-59 0-82 1-06 1-20 1-10 0-86 
4-04 1-45 1-21 1-31 1-88 6-36 
6-81 1-78 1-13 1-09 1-71 7-41 
0-0 0-44 0-60 0-81 1-00 1-06 0-94- 0-74 
3-49 1-22 0-99 1-00 1-19 1-79 6-28 
7-88 2-03 1-23 1-00 lll 1-90 8-54 
—03 oe 0-55 0-70 0-81 0-82 0-72 0-58 
0-86 0-84 0-93 1-16 1-82 6-54 
1-56 1-21 1-16 1-42 2-52 11-35 
~0-6 ak nt 0-55 0-60 0-59 0-53 0-44 
0-80 0-94 1-20 1-92 7-06 
1-45 1-54 2-03 3-65 16-16 
—0-9 os oer ~— 0-44 0-43 0-39 0-33 
0-98 1-29 2-10 7-84 
2-21 3-02 5-44 23-69 
—1-2 ches * ils 0-32 0-29 0-26 
1-42 2-44 8-94 
4-48 8-45 33-74 
—155 sip sit sae sl ib 0-22 0-20 
2-64 10-14 
11-89 50-96 
—1-8 _ -_ ial: _ ian ying 0-16 
11-50 
72-33 
































Biometrika 39 











162 Experimental designs for serially correlated observations 


The numbers given are: 

(i) (m/2c*) x variance of the maximum-likelihood estimator using systematic designs 
of type III. 

(ii) (m/2o?) x variance using randomized designs. 

(iii) The efficiency of type III designs compared with the randomized scheme, i.e. (ii)/(i). 

In the points where no entries are made the process is not stationary. When the process 
is nearly unstable, the correlation is very high, and the variance of the randomized block 
estimates is consequently high. This accounts for the high value of the efficiency in these 
cases. 

4, THE EFFECT OF DEVIATIONS FROM THE AUTOREGRESSIVE FORM 

4-1. The assumption that the error is due to a linear stochastic process is at best an 
approximation. We shall only consider the effect of deviations from this assumption in the 
first-order case. Two possible sources of error are: 

(i) That a monotonic trend is present. 

(ii) That the correlation does not obey the law p, = p’. 

4:2. The effect of a monotonic trend on the estimate of treatment differences is reduced 
by the condition that the treatments are arranged in blocks, and designs can be constructed 
so that the mean position of each treatment is approximately the same. Where the design 
consists of a number of repetitions of the same basic design we can achieve this by 
permuting the treatments subject to the restrictions of §2-3. In the case of designs of 
type II (b) this can be done by reversing every second repetition. 

A trend is therefore not likely to give a serious bias to the estimate of the treatment 
differences, but a monotonic trend will lead to an overestimate of the value of the correla- 
tion p. This will tend to reduce the estimate of the variance of treatment differences given in 
equations (13) and (15) except for design II (a) when p is less than —1/(p—1), in which 
case the estimate will be increased. On the other hand, since the weights in equations (12) 
and (14) are incorrect, the actual variances of the treatment differences will be increased. 
With the above exception, the effect of these errors will be cumulative in leading us to 
underestimate the variance. 

4-3. To estimate the importance of deviations from the rule p, = p”, we consider the 
effect of applying the design and analysis appropriate to the first-order process, when the 
errors are in fact due to a second-order process. 

If the actual relation is x;+d,x,;_,+d,%;_. =; and the relation is assumed to be 
Xx; = px;_,+€;, then, since the maximum-likelihood estimator of p is approximately equal 
to the first serial correlation coefficient, p will be estimated as p, = —d,/(1+d,). 

The treatment difference (a,—a,) will be estimated from equation (12) using p = p,, and 
we denote it by (a@,—a,)’. This estimate is unbiased. 

The actual variance V; of (a,—a,)’ will always be greater than the variance V, of the 
estimate based on the second-order hypothesis. We also have an inaccurate estimate V, of 
the variance of (a,—a,)’ derived from equation (13). 

The variance V; of (a,—a,)’ will depend not only on the parameters d,,d,, 0%, but also on 
the pairs of values r, s and on the particular design chosen. The calculations are laborious, 
and to simplify them we shall calculate only the average value of Vj over all pairs r, s and 
we shall assume 


(i) That the design consists of | repetitions of a basic design containing q blocks of 
p treatments, so that m = lq. 


| 
| 
{ 
; 














R. M. WiILLiaMs 163 


(ii) That end-effects can be ignored. 
(iii) That the parameters are not subject to sampling errors. 
The assumption (i) makes it possible to reduce the detailed calculation of V{ to the 
calculation of pa 
var[ & ae, x (%)] = var (x) & Ap 
(ij=r,B ij=s,B t=0 
where > (z;,) is the sum over the ‘ton design and > (z,) is the sum over the / 
{ijJ=r,B {ijJ=r,L 
repetitions of the basic design. If 
1 — £22) — (1 — gla 
dy = Heh + pe es 2, 
(1—£?2) } 


and @, is the same function in £, then we have 





var[ > (20 Ed) = var (2) EAlrbu+ eoby) 


{iJ=r,L 


cov [( i! @)— eC > PO- ,, ,eol 


{iJ=r, ij=s, fit1)=r, J {tt1]=s, L 


= var (2) 3 Bilea(Es + 62") byt Co(Ea +z") Gal, 











(23) 
Pee 
= Le Pad ae we £,82(1 +) 2) 
+ z Ber(S1 + &1*)® by + Co(Es + S2*)? x) | . 
Applying these results to equation (12) and noting that fy = 2g, we have 
a 
yi = o2 fone te d,)? da? d3 ~ BACEx ie &")8 Py Sag (E2 oe 1) af 
1 ai ‘ — 
mol a +(1+d,)?— a + ey d, (1 —dg) (€,— £2) ed 


Here only £, depends on the particular design and the values of (7,8). The average of the 
variance (written var (a,—a,)’ = V;) over all pairs (r,s) is given by replacing , by their 
average values #, given by 


Pq pa 
var (x) & Bia) = pS var S  (a)—var (a) (p2+2 S (pg-HA) |. 


eH 1) ee ae 

If 1 = 1, the coefficients of f, in equations (23) and (24) may be more easily expressed 
directly in terms of p,, 2,53, +2, Which can be obtained directly from equation (17). If 1+ 1, 
equation (24) must be used. 

If | |, || are small, the variance is inversely proportional to J, and the results in 
Table 4 show that even for quite large values of £,, &, this approximation is fairly good. 

The value of V, would be estimated from equation (13) putting 


p =p, = —4,/(1+4,), 
and o* = (1—p}) var (@;) 
mess o*(1-+ da) 
~ Oe) yaya) (I= d,)" 











164 Experimental designs for serially correlated observations 


These results were applied to the series 





(a) x; —_ 0-92;_4 + 0-52;_9 = Ey, 

(b) x; = 0- 3Xj-4 = 0: 52j_9 => Ej, 

(c) x;,—0°62;_, = &. 

For all series Pp, = 0-6, var (e;) = o. 

A type III design 

(12345) (24153) (14532) (51243) 

was chosen so that a correct maximum-likelihood estimator would be available for 
comparison. The variance of this estimator V,, and the variance of the estimate under 


randomization, i.e. Vz = (2/m) var (x;) (1—p) (see § 2-2) are also given for comparison. 














Table 4 

(a) (b) (c) 

} 
Vi 0-2661 — 0-011(2 — 1) 0-4981 + 0-003(2 — 1) 0-301 
o ze Re ar 
V; 0-462 0-402 0-301 
oF x a be mx 
V, 0-201 0-300 0-301 
oF = 4 rc 2 CH 
Ve 0-849 0-423 0-466 
oo _— i “—_ 




















These results suggest that in series such as (b), where the correlogram differs appreciably 
from that of a first-order process, fairly serious errors can be introduced and the error 


underestimated; and it is important that series deviations from the first-order model should 
be detected. 





5. APPLICATION TO EXPERIMENTAL DATA 


Two sets of uniformity trial data were tested to see whether the variability could be 
described by a linear autoregressive model. 


To test the goodness of fit we used the test given Ly Quenouille (1947), of which the 
principal result is that if the autoregressive relation is 


X,+d,x;_,+ eee +d,,2;_, = €;, 
and if (l+d,u+d,u?+...+d,u,)? = z A,u', 
R, = z Airs-; (s = k+1,k+2,...), 
—-@ 


where r; is the sample serial correlation coefficient, then, if the sample number 7 is large, 


the forms R, are approximately normally distributed with zero mean and variance >; me 


where X;= > p,p,_,;- The forms R,(s > k) are distributed independently of each other and of 





for 
ler 


bly 
Tor 


uld 


be 


the 








R. M. WriuiaMs 165 


1,—p,(8<k). The values of d,, dg, ...,d; may be determined from the first k values of r, and 
the goodness of fit tested by using the forms R,. 

We denote the form R, when a first- or second-order process has been fitted by Ry) or 
Rg) respectively. 

We tested data kindly supplied by Rothamsted Experimental Station giving the yields 
obtained each year from 1852 to 1946 from two of the plots (referred to as plot 2B which 
received farmyard manure, and plot 8 which received artificial manure each year) of the 
Broadbalk wheat experiment, in which wheat has been grown on the same plots each year, 
receiving the same manurial treatment throughout the experiment. Taking the deviations 
from the mean yield for the whole period the first serial correlation coefficient r, was found 
to be significant at the 1% point for plot 2B and at the 5 % level for plot 8. 

The values of r,, R, for both series are given in Table 5. 























Table 5 
Plot 2B Plot 8 
8 
% Ryay t, Rw Rye) 

1 0-3788 = 0-2162 = — 

2 0-2112 0:0677 0-2265 0-1797 = 

3 0-1519 0:0463 0-1008 0-0130 0-0203 

4 0:2140 0-1292 0:2387 0-2057 0-1748 

5 0-2862 0-1459 0-2556 0-1571 0-1596 

6 0-2501 0-0640 0-3526 0:2532 0-1950 

Z 0-1366 —0-0118 0-0461 —0-0944 —0-1468 

8 0-0746 0-0070 0:1088 0-1053 —0-0041 

9 0-1076 0-0707 0-0870 0-0421 0-0653 
10 0-1939 0-1231 0-2129 0-1804 0-1603 
11 0-1067 — 0-0248 0-0832 — 0-0964 —0-0128 
12 0-0535 0-0005 0-0151 —0-0109 — 0-0782 
13 —0-0018 —0-0270 0-0044 0-0018 —0-0125 
14 —0-0651 — 0-0560 0-1173 0-1161 0-1236 
15 —0-1527 —0-1034 —0-1201 — 0-1682 — 0-1588 
16 —0-1496 — 0-0433 — 0-0982 —0-0408 — 0-0959 
17 —0-1444 — 0-0530 —0-0973 — 0-0604 — 0-0133 
18 — 02379 —0-1500 —0-1681 —0-1306 —0-1037 
19 —0-1612 —0-0017 — 0:0556 0-0625 0-0763 
20 -—-0-0151 0-0729 —0-0349 — 0-0403 0-0154 

var (R,) - 0-00780 as 0-00967 0-00909 

2— ~ . Sas ; , 
= esa (Ry) 14-97 30-72 24-11 























For plot 2B the value of x? is not significant. There is a tendency for the positive and 
negative value to occur together, but the number of runs of positive and negative values is 
not significant at the 5 % level (see Swed & Eisenhart, 1943), and the form x, — 0-3788z,_, = €; 
seems to fit reasonably well. For plot 8 the value of x? for the first-order process is just 
significant at the 5% level; a second-order process x; — 0-1754x,_, — 0-18862;_, = €; gives 
a satisfactory fit. The residuals obtained after fitting these series were tested for randomness 
and found to be not significant. 











166 Experimental designs for serially correlated observations ‘ 















































The data used to test whether the autoregressive model could be fitted to a series of 
agricultural plots arranged in a line were taken from the uniformity trial data given by 
Garber, McIlvaine & Hoover (1926, p. 263, column 1). Each plot, which was sown with 
wheat, measured 68 x 21 ft. There were only forty-five consecutive plots available. This 
series is really too short to be satisfactory, but in the absence of other suitable data we made 
use of it, while recognizing that not too much weight should be given to the result. 

The observations, expressed in bushels per acre as deviations from the mean are given 
in Table 6. 

Table 6 
Plot Yield Plot Yield Plot Yield Plot Yield | 
t 
1 ~05 13 0-9 25 ~ 2-0 37 — 2-4 
2 — 6-0 14 1-0 26 —0-8 38 —5-7 
3 — 0-6 15 4:0 27 1-9 39 —0-9 
at om 16 2-8 28 1-2 40 —0-2 ) 
5 —1-3 17 0-4 29 41 41 0-5 
G..c 1-7 18 0-6 30 1-7 42 —0-8 
7 | —0-1 19} —4-8 31 11-1 43 —0-7 
8s | -22 20 | 47 32 10-9 44 —4-0 
o | 365 21 — 52 33 10-9 45 -0-4 
10 | 35 22 -21 34 6-0 
ll —49 23 2-6 35 5-2 
12 | 0-7 24 —3-5 36 1-6 
Table 7 
8 r, Ray(n—8)! | 
1 0-626 -— 
2 0-430 0-249 
3 0-267 — 0-169 
4 0-182 0-102 
5 — 0-109 — 1-467 
6 — 0-138 0-437 
7 — 0-286 — 0-962 
8 — 0-224 0-487 
9 — 0-262 — 0-564 
10 — 0-384 — 0-852 
ll — 0-508 — 0-758 
12 — 0-509 — 0-138 























A first-order form, x;—0-6262z;_, = €,, was fitted, and the goodness of fit tested by com- 
12 
paring & (Raw(n — 8) (see Table 7) with its estimated variance, (1—7r?)?. The value of 


14-18 so obtained is not significant for 11 degrees of freedom. 
We conclude that, as far as Quenouille’s test can be trusted for such a small value of n, the 
data appear to be adequately represented by a first-order process. 








ven 





— 








R. M. WrittaMs 167 





SUMMARY 


A method of experimental design is developed for use when the errors are in the form of 
a first- or second-order linear autoregressive stochastic process. Estimates of treatment 
effects and the efficiency relative to a randomized block design applied to the same data are 
given. The effect of deviations from the first-order process on the efficiency of the design 
is dealt with. Consideration is given to the suitability of these designs for agricultural data. 


I wish to acknowledge gratefully my indebtedness to Mr F. J. Anscombe for suggesting 
this subject of research and for his advice and assistance during my investigations at the 
Statistical Laboratory, Cambridge. I also wish to thank the Department of Scientific and 
Industrial Research, New Zealand, for financial assistance at that time. 


REFERENCES 


GaRBER, R. J., McItvarne, T. C. & Hoover, M. M. (1926). A study in soil heterogeneity in 
experimental plots. J. Agric. Res. 33, 255. 

Hatp, A. (1948). The decomposition of a series of observations. G.E.C. Gads Forlag. Copenhagen. 

Koopmans, T. (1942). Serial correlation and quadratic forms in normal variables. Ann. Math. Statist. 
13, 14. 

QUENOUILLE, M. H. (1947). A large-sample test for the goodness of fit of autoregressive schemes. 
J.R. Statist. Soc. 110, 123. 

QUENOUILLE, M. H. (1949). On a method of trend elimination. Biometrika, 36, 75. 

Swen, F. S. & E1sennart, C. (1943). Tables for testing randomness of grouping in a sequence of 

alternatives. Ann. Math. Statist. 14, 66. 








[ 168 ] 


THE TIME INTERVALS BETWEEN INDUSTRIAL ACCIDENTS 


By B. A. MAGUIRE, E. S. PEARSON and A. H. A. WYNN 


1. INTRODUCTION 


Statistical methods have been used for many years for studying the frequency of occurrence 
of accidents in fixed intervals of time. Statisticians are familiar with the use by Bortkiewicz 
(1898) of the Poisson distribution to analyse the numbers of men in ten Prussian army 
corps killed by the kick of a horse, quoted, for example, by Yule (1922) and still being 
quoted in text-books (Jeffreys, 1948). Again, the series of papers describing research on 
industrial accidents published by the Industrial Health Research Board, which began with 
a paper by Greenwood & Woods (1919), are based on the analysis of the frequency of accidents 
occurring in fixed intervals of time. 

It is, however, a characteristic of an accident that it occurs at a particular instant of 
time, which is often recorded. Thus the basic data consist of a sequence of ordered intervals 
of varying length. Analysis may therefore be applied to the time intervals between accidents 
rather than to the frequencies of accidents occurring in successive fixed intervals of time. 
Which of the alternative methods of analysis is the most efficient will depend on the type 
of question we are asking and the assumptions regarding the data which we are prepared 
to make. Under certain conditions the two methods of attack are clearly equivalent. Thus 
if we are prepared to assume that accidents are taking place at random in time and at 
a constant average rate, we shall obtain the same estimate of this rate* either from the 
average number of accidents occurring in successive fixed time intervals or from the 
average length of the varying interval between the accidents. 

When, however, a more detailed analysis is required, as, for example, in testing for 
changes with time, it seems likely (provided accurate interval data are available) that 
methods of analysis based on these will be more powerful than those often employed using 
only the accident frequencies in relatively long fixed intervals of time. This advantage will 
lie partly in the fact that for an analysis to be sensitive to changes in time it must be able to 
handle observations in small groups, and an exact distribution theory for the continuous 
variable (the interval) is available, where it does not exist in manageable form for the 
discontinuous variable (the frequency or count). In the present paper, however, we are not 
concerned with a comparison of the efficiency of the two methods of attack, but rather with 
an examination of available methods of handling the interval analysis. 

Any conclusion about the accident liability of a single individual must always be based 
on very small numbers even in unusually dangerous occupations. Serious accidents of 
particular kinds, such as explosions in mines, are fortunately infrequent. The extent to 
which any body of accident statistics may be usefully analysed depends in part upon 
the power of the statistical methods to generalize from these small numbers. The loss of 
information entailed in compiling accident frequencies in fixed intervals of time may, 
therefore, seriously reduce the usefulness of the records. 


* Apart from a marginal difference due to the fact that the time stretch considered may not be 
precisely the same in the two cases. 





Ga 


‘ence 
wicz 
irmy 
eing 
h on 
with 
lents 


nt of 
rvals 
lents 
‘ime. 
type 
ared 
Thus 
d at 
| the 

the 


t for 
that 
sing 

will 
le to 
10Us 

the 
‘not 
with 


used 
s of 
t to 
pon 
s of 


lay, 


t be 





B. A. Maaurre, E. 8. Pearson anp A. H. A. WYNN 169 


In routine accident control it is important for a manager to have the earliest possible 
indication of a significant change in the expectation of accident. Administrative measures 
may have been taken to reduce accidents; it is important to have the earliest possible indica- 
tion of the effect of these measures. Analysis of the time intervals between accidents may 
provide an earlier indication of an improvement than any analysis of accident frequency. 

Failure in the past to use time-interval analysis for research on industrial accidents has 
been partly due to the more complicated records necessary and to the difficulty of defining 
intervals. This paper is not concerned with these practical difficulties but introduces 
a discussion of the many statistical techniques which are available for the analysis of time- 
interval data if the practical difficulties can be overcome. 

There is one note of caution which should be raised. If we take for consideration the sample 
of intervals between accidents occurring in a given period of time, e.g. in a year, the data are 
in some degree selected. For instance, no intervals of a year or more can be included in the 
sample. While this should not affect the theory seriously, provided the average interval is 
short compared to the fixed period, the matter is one requiring investigation. 


2. THE EXPONENTIAL DISTRIBUTION 


It was shown by Whitworth (1901) that if the expectation of events per unit time is constant, 
then the time intervals between events are exponentially distributed. If E is the expectation 
of accidents per unit time, then the probability density function of time intervals may be 


written f(t) = Be-®, (1) 
For this distribution, both mean and standard deviation equal 1/Z. 


Table 1. Time intervals in days between explosions in mines, involving more than 
10 men killed, from 6 December 1875 to 29 May 1951 


378 286 871 66 
36 114 48 291 
15 108 123 4 
31 188 457 369 

215 233 498 338 
11 28 49 336 

137 22 131 19 

4 61 182 329 
15 78 255 330 
72 99 195 312 
96 326 224 171 

124 275 566 145 
50 54 390 75 

120 217 72 364 

203 113 228 37 

176 32 271 19 
55 23 208 156 
93 151 517 47 
59 361 1613 129 

315 312 54 1630 
59 354 326 29 
61 58 1312 217 

1 275 348 7 
13 78 745 18 

189 17 217 1357 

345 1205 120 (complete interval 
20 644 275 to 29 May 1951) 
81 467 , 20 


Mean time interval = 241 days. 





170 The time intervals between inibasirial accidents 


Table 1 gives the time intervals between explosions in mines in Great Britain involving 
the loss of ten lives or more since 1875. In Fig. 1 these intervals are shown to be approxi- 
mately exponentially distributed. A study of other sequences of industrial accidents shows 
that the time intervals are usually exponentially distributed to at least arough approximation. 


Table 2. Time intervals in days between successive compensable accidents 
in one district of a mine 


3 2 + 8 3 
23 0 0 2 12 
0 0 0 0 10 
0 2 2 1 4 
2 3 1 1 0 
3 0 1 8 3 
0 0 1 0 2 
2 0 1 0 2 
1 0 8 0 0 
0 0 0 1 0 
1 3 0 8 14 
0 4 2 3 1 
0 2 0 0 + 
0 3 2 2 8 
0 5 2 0 

1 0 0 1 

0 0 2 1 

1 2 0 3 

3 + 0 2 

2 5 2 0 

1 0 8 4 


Mean time interval = 2-2653 days. 











































Histogram of time intervals between successive 


—_—— 





killed with fitted exponential curve (see Table 1) 





| explosions in mines involving more than 10 men 

















Number of time intervals 







































q 
N 


0 150 300 450 600 750 900 1050 1200 1350 1500 1650 
Time intervals between disasters in days 


Fig. 1. 





Table 2 gives the time interval between successive accidents for one shift in a section of 
amine. Fig. 2 shows that the distribution is approximately exponential. 

If the risk of accident were constant, the exponential distribution would be a good fit. 
In order to show that there have been changes in risk or expectation of accident, tests of 











~ 








B. A. Maaurrg, E. 8. Pearson anp A. H. A. Wynn 171 


homogeneity or goodness of fit are required for an exponential distribution. This method 
of attack has been suggested by various writers including Bortkiewicz (1898), Morant (1920), 
Neyman & Pearson (1928) and Sukhatme (1936). 
















































































40 i. 
= Histogram of time intervals between successive 
c accidents in one district of a mine in days with 
. 30 fitted exponential curve (see Table 2) 
E 7 
xe) 
5 2-4\— 
ve} 
3 
3 
Zz 
10 
SS 
\N 
0 


0 2 4 6 8 10 12 14 16 8618 20 «22 24 


Time intervals between accidents in days 


Fig. 2. 


The exponential distribution has also been used in a more generalized form which assumes 
that following each event there is a ‘closed period’, tf), during which another event cannot 
occur; (1) then becomes fll) = Be-Bt-0, (2) 


This distribution has been applied by various writers, including Sukhatme (1936), to the 
analysis of the time intervals between telephone calls. No useful application to the study 
of industrial accidents appears, however, to have been discovered so far, and the simpler 
distribution (1), where tj = 0, appears adequate for accident analysis. 


3. THE DISTRIBUTION OF THE SUM AND MEAN OF TIME INTERVALS 


It may be shown that if time intervals are exponentially distributed, then both the sum, 
T, and the mean, #, of n intervals have a Pearson type III distribution. The distribution of 


the sum of n intervals, say 7’, is 
ET)"-1¢-ET FE 
40) -§ (3) 


The distribution of the mean of n intervals, say #, is 
(nEi)"-1 e-"ElnE 


a (4) 








The distribution (4) has a mean of 1/Z and standard deviation of 1/(./(n) Z); as n becomes 
large, this distribution tends to the normal forrn. ¢ is also a sufficient estimator of 1/2. 

It follows from the form of equations (1) and (4) that 2#t and 2nHi are distributed as 
x? with degrees of freedom v = 2 and v = 2n, respectively. This means that well-known 
properties of the x?-distribution and of related functions, as well as existing tables, may be 











172 - -‘ The time intervals between industrial accidents 


employed in accident interval analysis. For example, / may be used to provide confidence 
limits for an unknown value of Z. Thus if y3_, and x? are the lower and upper 100a % points 
of the distribution of x”, having v = 2n degrees of freedom, there is a probability of 1 — 2a 
mae Xial (2nd) < E < x2/(2nt). 

This should be understood as meaning that if this estimate is made in cases where the 


intervals are independent and distributed exponentially, it will be correct in the long run, 
whatever be n, in about 100(1 — 2a) % of cases. 


4. THE DISTRIBUTION OF THE RATIO OF THE MEANS OF TIME INTERVALS 
If x? and x3 are two independent values of y? having, respectively, v, and v, degrees of 
freedom, then the variance ratio F is defined as 
pad fab, 
V4] Ve 

It follows that if ¢, and é, are independent mean intervals, based respectively on n, and n, in- 
tervals, andif HZ, and EZ, are the corresponding expectations of accidents per unit time, then 
E,t,/(E,é,) will be distributed as F with degrees of freedom v, = 2n,, v2, = 2n,. Thus to test 
whether EZ, = E,, we may refer #,/¢, to the tables of the F-distribution. In this way we can 
test very simply for significant differences between the accident risk in two different places 
or during two different periods of time, assuming, of course, that time intervals are 
distributed independently and exponentially. 

Thus when operations started on a particuler working face in a mine there were seven 
compensable accidents with a mean time interval of 32 days. New machinery was then intro- 
duced and there were twelve further accidents with a mean time interval of 21 days. The ratio 
32/21 = 1-52. The 10 % point for F is 1-86 for vy, = 2 x (7—1) = 12and vy, = 2x (12—1) = 22 
degrees of freedom. The evidence from these data that the introduction of machinery has 
increased the expectation of accident is thus very weak. 

As indicated in the preceding section, we may also obtain confidence limits for # or 1/E 
based, say, on the value of f = 21 days observed after new machinery had been introduced. 
For v = 22 the lower and upper 5% points for x? are 12-34 and 33-92 respectively. Hence we 








may state that 22x21 1 22x21 l 
<=< * 13-6< =< 37-4 
3902 “E< i294 % =(196<3<37-4 days 


with a probability of 1 — 2 x 0-05 = 0-90 of being correct, provided that the intervals follow 
the exponential law. 

As a further illustration, consider the following case. A district in a mine had been working 
for over two years and there had been sixty-three compensable accidents. The mean time 
interval was 13-7 days. It became necessary to transfer a group of men to a new district 
replacing the experienced men by new face workers. Immediately after the change there 
were five accidents with a mean interval of 1-8 days. The ratio of the means is F = 7-6; the 
numbers of degrees of freedom for F are 124 and 8, which give a 1% point of about 4-9. The 
change resulted, therefore, in a highly significant increase in the expectation of accident. 


5. THE EXTREME OBSERVATIONS IN SAMPLES FROM AN EXPONENTIAL DISTRIBUTION 


The x* and F-distribution may both be usefully applied to the analysis of time intervals 
between accidents. Their application depends, however, on the assumption of homogeneity. 







































B. A. Maauire, E. 8. Pearson anp A. H. A. Wynn 173 





id ' Particular tests for homogeneity may be based on the distribution of the extreme intervals. 
ats Range tests are particularly useful for indicating significantly long periods of immunity 
20 from accident. Thus in Table 1 it will be noted that there are intervals of 1613 and 1630 days 

between disasters. Is there evidence that the risk of disaster was lower during these periods? 
the 
~ = (a) Distribution of t,](né) 

If t,, is the largest among n independent time intervals and ¢ the mean of the n intervals, 
then if we let g = t,,/(nt), the probability that g exceeds a given value G is shown by Fisher 
(1929) to be 
of n! 
=e poe n—1 _ oa ae n—1 te =n ees n—1 
Pr {g > G} = n(1-G@) 3n(n — 1) (1—2G)"-14+...+ (-1) iin —b) ' k@)r-, 
' 
where k is the largest integer less than 1/G. 5 and 1 % significance levels for g up to n = 50 
8 8 gn P 
—_ have been tabulated by Fisher (1929, 1950), who has shown that a good approximation 
ae can often be obtained from the first term of the series. 
oat Application to data in Table 1. Here n = 109, ¢ = 241, t,, = 1630. The significant value of 
ia g at the 5 % significance level is 9.9; = 0-068703. Hence t, is significant if 
' 
ces t, > 9.95 x nt = 1805 days. 
sod Our longest interval, 1630, is therefore not significant at the 5 % level. 
Application to data in Table 2. n = 98, nt = 222, t, = 23. We find that g = 0-1036, and 

- from the formula above, of which the first term is sufficient, 
rO- 
tio Pr {g > 0-1036} = 0-0024, 
A so that having regard to the mean of 2-27 days, the largest interval of 27 days is clearly 
as significant. 
|E (b) Distribution of t,,/t, 
sd. Hartley (1951) has recently considered the ratio, Fnax, = Sirax./S2rin., Where s2,,, and 
we Sin. are the largest and smallest out of & independent mean-square estimates of variance 

each based on the same number of degrees of freedom v. He has also shown that when pv = 2 

@ 
= Pr {Finax. < F} = kf e-* (e-7 —e-F'2)k-1 dy, (5) 
0 

. Clearly if ¢,, and t, are the largest and smallest of n time intervals, their ratio t,/t, will 
26 follow Hartley’s distribution (5). It must be remembered, however, that relatively small 
% ‘ inaccuracies due to recording the shortest interval to the nearest day, week or month have 
a large influence on the value of t,, /t,. 
“ae In Table 1 the ratiot,,/t,; = 1630/1, which is not significant at the 5 % level. However, since 
a t,, recorded to the nearest day, is given in this case as 1, we can only say that 0-5 <t, < 1-5; 
. thus F,,,x, may lie between 3260 and 1087. Indeed the shortest time interval may in practical 


cases often be recorded as zero, as in the list of intervals in Table 2 between successive 
accidents in one district of a mine. It is therefore not likely that the ratio t,,/t, will be found 
very useful in this field of application. 

uls The distributions of t, and ¢,, —t, may be useful if there is a satisfactory basis for assuming 
a hypothetical value for the expectation of accidents. 











174 The time intervals between industrial accidents 
(c) Distribution of t, 
We have for ¢,, Pr {t, > 7} = 1—(1—e-*7)", (6) 


where £ is the expectation of accidents per unit time. 

Application to data in Table 1. n = 109, E = 0-004150. We find from (6) that with 109 
observations the 5 % significance level of Ht,, is 7-5268. Thus if the true value of Z were as 
found for this 75-year period, the 5 % significance level for the largest interval would be 
1813 days. The observed value, t, = 1630, falls within this limit. It also falls within the 
10 % limit, calculated on the same basis. 

Application to data in Table 2. n = 98, E = 0-4414. The 5 % significance level of Et,, is 
7-4866. Using the observed Z£, this gives 16-96 as the significant value of t,. The largest 
interval, 23, falls beyond this level. 

This test as applied in these illustrations, is of course not exact, as Z has been determined 
from the data. 

(d) Distribution of t,, —t, 


For the variate y = t, —t,, we have 
Pr{y>Y} -| (n—1) Be-F¥ (1—e-2¥)"-2d¥ 
¥ 


= 1—(1—e-#Y)n-1, (7) 


a result very similar to (6) except that n—1 replaces n. If n is large these tests are very 
nearly the same. For n small it may be that the test based on t, —t, is more powerful. 


6. THe M-trEst 


Tests may be applied to discover whether there is any significant tendency for intervals to 
succeed one another in groups, sometimes longer, sometimes shorter. A failure of the overall 
distribution of intervals to correspond with the exponential might be due to sucha tendency; 
within short periods the exponential law might hold but Z might change from period to 
period. The whole sequence may be broken into k groups of n successive intervals and may 
be tested for a significant difference between the k mean intervals, these mean intervals being 
each estimates of 1/2. 

If; is the mean interval in the ith group of n consecutive intervals (i = 1, 2,..., k), where 
the intervals are exponentially distributed with parameter E,, then the probability density 
function for é; is given by 

Sf(i,) dt; = ra (nE;£,)"—1 e-™Fid(nE i.) (8) 
or 2nE;t,; is distributed like y? with v = 2n degrees of freedom, i.e. the mean intervals 
t,, t, ...,4, of the & groups of n successive intervals will, on our hypothesis, be distributed 
independently as x2/(2E,n). 


If the data are homogeneous, the E; will be equal. This can be examined by using the 
standard test for homogeneity of variances in samples from a normal population, commonly 
called Bartlett’s test, though in the case of equal groups it is identical with the test given 
by Neyman and Pearson in 1931.* 


* See note on p. 180 added in proof. 








ied 








B. A. Maauire, E. 8. Pearson anp A. H. A. Wynn 175 
The test criterion to be calculated is* 


< ‘tl fog. (Ez 1.) - 7% log, (9) 


where 7; = ni; is the sum of n intervals in the ith group. Bartlett (1937) shows that M/C 
is distributed approximately as x? with v = k—1, where 


C = 14+(k+1)/(6nk). (10) 


For n small and k< 15, the tables computed by Thompson & Merrington (1946) give the 
5 and 1 % points for M. 

If v is large we may have to make use of the fact that, to a good approximation, 
(2x?) —/(2v— 1) is a normal deviate with unit standard deviation. The M-test may also be 
used when the numbers of intervals in each group are unequal, the necessary modifications 
to equation (9) being given in the papers quoted. 

The data of Table 2 were broken up into fourteen consecutive groups of n = 7 intervals, 
the fourteen sums, 7;, calculated and M obtained from equation (9) as follows: 


M = 196 (logy, 2-2653 — 1,3-99028) log, 10 
= 31-639. 


Further, C = 1+ 15/(6 x 98) = 1-0255. 

Hence M/C = 30-85 with 13 degrees of freedom. Regarded as x”, M/C is just significant 
at the 0-5 % level. There appears therefore to be evidence of fluctuations in accident 
expectation with time during the period considered. 

The M-test may also be applied to single intervals, i.e. when n = 1, but here, in practice, 
the application may be impossible if the record gives zero intervals making some values of 
log, T; = —0. Table 1 contains no zeros, and we find that the set of single intervals is just 
significant at the 10 % level, M being 155-589. Thus the evidence that Z is varying is slight. 

It will be noticed that the results of applying the M-test in this section confirm those 
obtained with the g-test in §5 above; the intervals in Table | are consistent with a constant 
accident expectation while those in Table 2 are not. 

If a sequence of intervals is broken up into small groups in the way suggested and 
M calculated, we are perhaps using the best method available of testing whether the 
intervals follow a common exponential law and are arranged in random order. The test 
does, however, involve an arbitrary element in the sense that the division points separating 
the group of n intervals and the value of n itself are at the statistician’s choice. In the ideal 
test this element would be eliminated; but to derive such a test more thought may be 
needed as to the kind of alternatives to homogeneity which may be expected to occur. 


7. A FURTHER EXAMPLE 


We shall take as an example to which several methods of analysis may be applied some 

data for the intervals between accidents causing fatalities in the mines of Great Britain. 

The seven lists in Table 3 correspond to the seven divisions of the National Coal Board and 
* In Bartlett’s (1937) notation, M = — 2log, 4. 











176 - The time intervals between industrial accidents 


the units of time are days. The period covered is 245 days in 1950. The first and last intervals 
in each division are not intervals between accidents, but between the start of the period 
and the first accident and between the last accident and the end of the period; they must 
therefore be omitted from any interval analysis. 








Table 3 
Div. 1 Div. 2 Div. 3 Div.4| Div. 5 Div. 6 Div. 7 
16 2 4 13 8 12 0 16 21 4 7 3 17 2 0 
3 2 9 4 4 1 13 8 2 5 9 20 3 3 1 
1 6 0 «60 1 7 2 24 15 1 10 7 & 7 1 
0 14 ll 3 3 1 13 6 1 13 4 12 1 4 15 
7 10 14s ‘. 3 5 5 6 2 0 21 2 
"is m4 63 5 5 14 1 9 6 23 00 5 
206 3 0 15 4 9 ll 2 9 3 9 13 6 5 3 
SS: = a i | 6 60 ll 1 — 14 19 1 3 14 
"9 r 2 © 5 0 8 = 0 12 46 2 
20 3 2 8 5 s $s 16 it. = , es 26 5 
5 15 26 3 «#O 2 8 8 o — e — 0 0 24 
10 13 2 3 2 13 5 98 1 — 5 — 0 1 0 
4 4 4 1 13 8 1 1 24 — 1 — 14 7 1 
6 8 4 0 2 1 13 3 14 — 14 — 2 3 0 
19 — 6 1 3 I 16 4 — 2— 10 5 8 
1 — 0 oO OO 4 17 9 9 — 9 — 22— 
9 — & § 3 3 5 — 2 — 8s — 6 0 — 
a a ae GE — — = a a 
eo o> +. s it — = rok = $-o — 
2— as 6 0 ll — as 1 — 10 — 133 2— 
ces a. 2  % 6 — — — — i — ..) — 





























If we are prepared to assume that accidents occur randomly in time and that the accident 
expectation was constant within a division during the 245 days, and merely want to test 
whether there is evidence that the rate differs between divisions, we may apply the simple 
x? test to the total number of accidents. Here we assume that the frequency of accidents 
in a fixed period of time is a Poisson variable. The frequency of accidents in each division, 
n;,j = 1,2,...,7, is given in Table 4, and we find that 


7 
v= > (ny— n/n = 44-23 with v = 6, 
j=1 


where 7 = 37-14 is the average number of accidents. This result is clearly very significant 
so that we must conclude that the differences among the n, are real.* 

Allowing, therefore, Z to vary between divisions, we may ask whether the intervals 
within a division appear to follow the exponential law of equation (1), with values of E, 
which may differ. A rapid check on this is obtained from Fisher’s g-test, the necessary 
information being given in Table 4, where as before ¢, is the largest of n intervals in a 
division, 7’ is the sum of the intervals and g = t,,/7.| The number of intervals for the 
jth division is, of course, n;— 1. 

For n;—1< 50, 5 and 1 % significance levels are shown, taken from Fisher’s (1950) table. 
For the other two cases, the probability level corresponding to the observed g is shown. The 


* Even when Division 4 with its exceptional interval of 98 days is omitted, y? = 26-22 with v = 5 is 
still a highly significant value. 
{ The suffices j7, which strictly should be given, have been dropped for simplicity. 


Cae eneeaeeeteEee a 


als 
od 


See eenemiianaanatianemiaecantiiee ae 


B. A. Maautrg, E. 8. Pearson ann A. H. A. Wynn 177 


only exceptional result is for Division 4, where g exceeds the 1 % level; this draws attention 
to the single long interval of 98 days. None of the other six values of g are significant at the 
5 % level. The exponential distribution has, of course, a very long ‘tail’, and the test is 
mainly helping us to determine whether apparently outlying observations are, in fact, 
exceptional. To derive a more critical test of departure from the exponential, we must 
specify the form of departure to which it is wished that the test should be sensitive. 

The M-test may also be applied to these data. Allowing for between-divisional differences 
in E;, it searches for fluctuations in accident risk during the course of the 245-day period. 
Omitting the first and last intervals, we have summed the intervals within each division 
into k consecutive groups of n = 4, omitting the last intervals where the total number was 
not a multiple of 4. The results are shown in Table 5, where M and the Bartlett corrective 
factor C have been defined in equations (9) and (10). The only value of x? which can be 
regarded as significant is that for Division 7 which falls just beyond the 0-5 % point. When 
the 7 values of y? are summed, we find that for vy = 54 



























































Pr {y? > 74-20} = 0-036. 
Table 4 
2 Significance levels 
Division 
(7) Ny ty i ] 5% 1% 
l 34 20 221 0-0905 0-1835 0-2237 
2 62 26 234 O-1111 i ane Pr {y> 0-111} = 0-057 
3 37 18 228 0-0789 0-1712 0-2086 
4 15 98 220 0-4455 0-3517 0:4272 
5 27 44 221 0-1991 0-2212 0-2699 
6 29 23 226 0-1018 0-2088 0):2547 
7 56 24 220 0-1091 — “= Pr {g>0- 1091} = 0-105 
Table 5 
Division k M Cc ye= M/C | v=k-1 

l 8 5-41 1-047 5-17 7 

2 15 18-66 1-044 17-87 14 

3 9 8-12 1-046 7-76 8 

4 3 3°88 1-056 3-67 2 

5 6 6-92 1-049 6-60 5 

6 7 4-50 1-048 4:30 6 

7 1? 30°13 1-045 28-83 12 

Total 74-20 54 

















Without Division 7, the y?’s sum to 45-32 which for 42 degrees of freedom is not exceptional. 

Have we any justification in concluding that the intervals for Division 7 show significant 
changes in risk? We are faced here with the arbitrary character referred to above of our 
selection of the consecutive group of four intervals. If instead of taking the groups 


Biometrika 39 12 








178 The time intervals between industriel accidents 


2,4,1,2), (0,6,1,4),..., we start two intervals later with (1,2,0,6), (1,4,2,0),..., and 
proceed as before we obtain x? = M/C = 17-58, 


a value which, for v = 12, is not significant at the 10 % level. In other cases, which we have 
examined, shifting the position of the grouping divisions has not led to so large a change 
in x2. However, the limitation of the test must be recognized. In this case we cannot apply 
the test to single intervals and so avoid ambiguity (as was possible in examining the data 
in Table 1) because of the presence of zero intervals in the data as recorded. 

Finally, we think it must be concluded that: 

(a) there is definite evidence of differences in accident expectation between Divisions; 

(b) the interval of 98 days in Division 4 is exceptional; 

(c) apart from this we have not established inconsistency with the hypothesis that the 
intervals are randomly and exponentially distributed within a division. 


8. VARIATIONS IN THE EXPECTATION 
If E is the expectation of accidents per unit time and is a constant parameter, it has been 
shown that the distribution of time intervals between events is exponential and is given 
by the probability density function f(t) = Be-®. (1) 


The distribution of accidents per unit time is the well known Poisson distribution which is 

discontinuous, the probability of r accidents in time 7 being 

e~E7 (Ery’ 

p(r) = (11) 
If £ is itself a random variable with probability density function h(Z), 0 < EZ <0, (1) may 

be transformed to give a new distribution of time intervals 


fi) = { * EME) e-Pd (12) 


This will be recognized as the Laplace transform. The tables of Laplace transforms and the 
considerable literature on their application to quite different problems may find some 
application to the analysis of interval distributions. See for example Carslaw & Jaeger 
(1941) and Doetsch (1947). Again, if Z is a variable with probability distribution h(Z) as 
above, (11) may similarly be transformed to give the probability of intervals in time 7 


i ag 
P(r) = ual E'h(E) c-"" dk. (13) 
*J 0 
(13) is also a Laplace transform. 
As an example, E may be assumed to be distributed in Pearson type III form 
ct 

1) — _ Ka -lpn7ck 

2a oor. (14) 
Substitution in (12) and integration gives 


f(t) = “(1 a (15) 


This J-shaped curve is a form of the Pearson type XI distribution. Substitution of the 


Pearson type III distribution (14) in (13) gives the general term of the negative binomial 
distribution found by Greenwood & Yule (1920): 


_(¢ \*4(q+1)...(q+r-1) 
f(r) = Gs i) ri(e+ly 





(16) 


B. A. Maeurre, E. 8. Pearson ann A. H. A. Wynn 179 


where 7 is taken to be unity. The Pearson type XI distribution is thus seen to be the dis- 
tribution of the time intervals between events which have a negative binomial frequency 
distribution and a Pearson type III probability distribution of the expectation of events 
per unit time. 

Every distribution of time intervals is seen to have two related distributions; the frequency 
distribution of events in fixed intervals of time, and the frequency distribution of the 
expectation of events per unit time. Their relationship is stated in general terms by (12) 
and (13). 


9. CONCLUSIONS 


It is apparent that many statistical tests, often already developed and tabulated for other 
purposes, may be usefully adapted to analyse industrial accident data if the difficulties of 
collecting and recording can be overcome. 

None of the tests described in this paper demonstrates lack of homogeneity in the series 
of time intervals in Table 1. No significance attaches to the long intervals of 1613 and 
1630 days or to the short intervals of 1 and 4 days. Such extremes might well occur during 
a period of 76 years even if the expectation of accident were constant. It is not suggested, 
however, that these tests are exhaustive or are the most sensitive that could be devised. 
Further, the conclusions only apply to one class of explosions in mines; milder explosions 
involving one man or more killed, or serious disasters involving fifty or more men killed may 
not provide homogeneous series. A thorough study of time intervals between explosions 
in mines would take into consideration the number of men killed and the number at risk. 

It has been seen that the time intervals in Table 2 are not homogeneous. The range test, 
in particular, provides a method of identifying a time interval when the risk of accident 
was sufficiently low. 

The problems of industrial accident control are statistically analogous to those of 
indust-ial quality control and similar practical techniques, including sequential procedures, 
can almost certainly be developed to help in their solution. There are, however, very 
important differences between quality control and accident control. It is easy to define 
categories of accident in which men are killed or in which men receive compensation, but 
many accidents are difficult to place in satisfactory categories. The problems of accidents _ 
are more human and more complicated than those of quality control. It is usually more 
important to extract the maximum amount of information from industrial accident data 
than from the data obtained from inspectors’ samples. In quality control the size of sample 
can be chosen, but in accident control the data must be examined from a sample which may 
be any size and cannot be chosen by the investigator. Finally, accidents happen at a 
particular time and, as the time of occurrence is a valuable part of the data, techniques 
based on the analysis of time intervals, unfamiliar in quality control, are needed. 

The analysis of intervals may not only be used to study events distributed in time, but 
points distributed in space. The exponential distribution was, indeed, probably first used 
to describe the distribution of free paths of molecules of a perfect gas. The analysis of 
intervals may be usefully applied to a much wider range of problems than hitherto. 


Acknowledgement is made to the Ministry of Fuel and Power for permission to publish 
this paper. ‘ 











180 The time intervals between industrial accidents 


REFERENCES 


Bart ett, M. 8. (1937). Proc. Roy. Soc. A, 160, 268. 

Bortkiewicz, L. (1898). Bull. Inst. Statist. 20 (2). 

Carstaw, H. S. & Jaxcer, J. C. (1941). Operational Methods in Applied Mathematics. Oxford: 
Clarendon Press. 

Doerscu, G. (1947). Tabellen zur Laplace Transformation und Anleitung zum Gebrauch. Berlin: 
Springer. d 

Fisuer, R. A. (1929). Proc. Roy. Soc. A, 125, 54. 

Fisuer, R. A. (1950). Contributions to Mathematical Statistics, 16, 59a. London: Chapman and Hall. 

GREENWOOD, M. (1946). J.R. Statist. Soc. 109, 85. 

GREENWOOD, M. & Woops, H. M. (1919). Industrial Fatigue Research Board Report, 4. H.M.S.O. 

GREENWOOD, M. & YuLE, G. U. (1920). J.R. Statist. Soc. 83, 255. 

Hart ey, H. O. (1951). Biometrika, 37, 271. 

JEFFREYS, H. (1948). The Theory of Probability, 2nd ed. Oxford University Press. 

Moran, P. A. P. (1951). J.R. Statist. Soc., Series B, 13, 147. 

Morant, G. M. (1920). Biometrika, 13, 309. 

NEyYMAN, J. & Pearson, E. 8. (1928). Biometrika, 20A, 175. 

NEYMAN, J. & Pearson, E. 8. (1931). Bull. Int. Acad. Cracovic, A, 460. 

SUKHATME, P. V. (1936). Statist. Res. Mem. 1, 94. 

THompson, C. M. & MERRINGTON, M. (1946). Biometrika, 33, 296. 

Wurrworts, W. A. (1901). Choice and Chance, 5th ed. Cambridge: Deighton Bell and Co. 

Yui, G. U. (1922). J.R. Statist. Soc. 85, 95. 


Note added in proof. The publication of a recent paper by Moran (1951) has drawn our 
attention to the fact that in the discussion following a paper by Greenwood (1946), 
regarding industrial accidents, Bartlett suggested that known tests for homogeneity of 
variances could be used in the analysis of random time intervals. In his paper, Moran 
supposes that as an alternative to the exponential distribution of time intervals of 
equation (1), the probability density function has the Type III form 

S(t) = (pE)e ev F'P/T(p). 

He then shows that in testing the hypothesis that p=1, i.e. that equation (1) is true 
against this class of alternatives, the likelihood ratio criterion will be a function of the M 
of equation (9), with n=1. 

The departure from randomness which Moran contemplates is, of course, different from 
that which we have considered, i.e. a situation in which the accident expectation ZH may 
change from time to time, but the distribution of ¢ remains exponential. 












[ 181 ] 


THE ESTIMATION OF DEATH-RATES FROM CAPTURE- 
MARK-RECAPTURE SAMPLING 


By P. A. P. MORAN 
Institute of Statistics, University of Oxford 


Leslie & Chitty, in a recent paper (1951), have discussed the problem of the estimation of 
death-rates from the results of capture-mark-recapture samplings of animal populations. 
Their paper constitutes an important advance but leaves open a number of theoretical 
problems, and the theory underlying some of the methods used is still obscure. The purpose 
of the present paper is to clarify the assumptions made and the logic of some of the 
methods. 

The basic idea is as follows. At successive points of time, which we take to be equidistant, 
a sample of animals is captured, marked with some mark distinctive of that particular 
sampling, and released. At each sampling the number of animals showing each of the various 
possible combinations of previous marks is recorded. From these records it is possible, under 
various assumptions, to make an estimate of the death-rate. We suppose the samplings to 
take place at times fo, t,,...,¢;,... which are equidistant. Let N, be the unknown total 
population at time ty and R, the size of the sample then taken. Each member of this sample 
is given the mark ‘0’. Having returned this sample to the population another sample of 
size R, is taken at time ¢t,. The number of this sample which possess the mark ‘0’ is defined 
to be 79, and the rest as u,, so that R, = 79, + u,. Similarly, the sample at time t., of size R, 
is composed of 79. members with marks ‘0’ and ‘2’ but not ‘1’, 7;. with marks ‘1’ and ‘2’ 
but not ‘0’, and of 79. with all three marks ‘0’, ‘1’ and ‘2’. Finally, there are uw, individuals 
with no previous mark. Similarly, the sample of size R,, taken at time t,, is made up of 
u, individuals not previously marked, together with various groups of members r,y,_; which 
have been previously marked at times a,b, ...,j), ete. For fo, t,, 4, and ft, the possible classes 
are set out in Table 1. 




















Table 1 
fg ty te ts 
Ug PR, To. Prey Tore Prous To123 
Ry N,-PR, wy P?R,— Pro, Tes P?ry, — Pro To13 
/ R, Pu, Tis Pins To23 
‘ N,-F?R,— Pu, uz Pry T1093 
Ry P3uy— P?ry9, — Proe To3 
P?u,— Pry, T13 
Pu, Te3 











N,— Pu,— P®u,— Pu, 

















182 The estimation of death-rates from capture-mark-recapture sampling 


We now assume that there is a constant death-rate acting during each period of time 
(t;, t;,) So that a proportion Q = 1 — P die off. Thus of the Ry animals marked at time ty the 
expected number surviving at time ¢, is PRy. Let N, be the total population at time t,. We 
do not take this to be a random variable nor do we specify it in any way as we wish to allow 
for an arbitrary number of births (or immigration but not emigration) during the period 
(to, t;). Similarly, at time t,, we suppose that the expected number of animals in the population 
with marks ‘0’ and ‘1’ to be Pro, with mark ‘0’ but not ‘1’ to be P?Ry— Pro, with mark 
‘1’ but not ‘0’ to be Pu,, and without any mark to be N,— P?R,— Pu,, where N, is the total 
population at time t,. Then in the sample of R, animals taken at time ¢, the observed values 
To12» To2) 712 aNd Uy will have expectations proportional to these quantities. The corresponding 
sample values, with the expected numbers in the population at time ¢3, are also given in 
Table 1, which can be easily extended to later values of t. We also notice that Table 1 gives 
the expected values in the population at each time conditional on the sample values occurring 
in the previous samples. These are the quantities we need if we wish to write down an 
expression for the probability of the whole observed series of samples. 

We have still not properly defined the sampling model involved, and we here run into 
some difficulty. Leslie & Chitty distinguish ‘deterministic’ and ‘probabilistic’ models, 
but in their sampling experiment they use a model of a third type which we shall call the 
‘incomplete probabilistic model’. The distinction between the three models arises from the 
exact meaning given to the survival rate P. 


THE DETERMINISTIC MODEL 


In this model P is taken to be the exact proportion of the population, in each class, surviving 
from one sampling time to the next. Then in order to write down the exact distribution at 
time t,, say, PRy must be an integer. rp, is then distributed in a hypergeometric distribution, 
since the sample of R, is a random sample out of a total finite population of N, members, 
PR, of which are marked ‘0’ and N,— PR, unmarked. Pro, cannot then, with certainty, 
be asserted to be a whole number. An exact distribution theory for this model cannot there- 
fore be set up unless P = 1. We must therefore approximate to the truth by ignoring this 
difficulty for the moment and supposing that PRo, Pro,, P?Ry— Pro, etc., can be regarded 
as integers. At each stage we then assume that we have random sampling from a finite 
population divided into several classes. If such a population consists of n members of which 
M1, ---,%, belong to classes 1, ...,k (Xn; = n), and we take a random sample of m (m <n), the 
probability of obtaining m,, ...,m,,(2m,; = m) in these classes is 


GS (-B) © 


We apply this result to the above scheme, noticing that the distribution of the kth sample 
is completely determined by what has already happened in samplings at fo, ...,¢,-;. The 
probability to be attached to the first three samplings is therefore 

“S i @ ~ de (= - tee “s. ee “i Pu, iis: (2) 
R, Yor ba \ To12 Yo2 "12 bat | 


This expression can easily be extended to further sampling stages. It clearly cannot represent 
an exact model of what is happening unless P = 1 (when P + 1 the above expressions may be 








P. A. P. Moran 183 


interpreted in terms of Beta functions). However, we can expect to obtain a good estimator 
for P in an actual situation by maximizing (2) for variations in P. This maximization is 
somewhat awkward, and Leslie & Chitty therefore approximate to (2) by replacing the 
multiple hypergeometric distributions by multinomial distributions. (2) then becomes 

(1) (N, — PR) R,! (Pro,)’12 (P?2Ry — Pro)" (Pu,)"2 (N,— Pu, — P?Ry)“* 


R aE “i NR 
ro) Ny Vorz' Yo2! Vig! Ue! Ny? 





(3) 
This is now a genuine probability distribution whatever the value of P in the range 
(0, 1), and we may therefore regard this as a probability distribution defining the ‘deter- 
ministic model’, although it cannot be,other than an approximation to what really happens. 
If we now wish to estimate P, N, and N, we maximize (3) for variations in these quantities 
(the estimation of Ny being, of course, impossible). In their paper Leslie & Chitty confine 
themselves to the estimation of P. To do this, they ignore the distribution of the w’s and 
consider the conditional distribution, at each stage, of the r’s, keeping the w’s fixed. This is 
clearly valid, and the expression for the conditional likelihood which is to be maximized 
can then be written (for three sampling times) 


(Ry—Ug)! (Pro,)" (P?Ry — Pro,)’? (Pu) (4) 
Tors! Tos! 13! (Pu, + P?R,y)*2-" J 





since the distribution at sampling time ¢, contributes no information about P. Omitting 
constant factors and cancelling P from the numerator and denominator of (4), the expression 
to be maximized is (PRy—To)"" 

(uy + PR) 


The corresponding expressions for further stages can easily be written down. Thus for four 
stages we include the distribution at t; which, omitting a constant factor, is 








(Prior — Tor2)"* (P?Ug — Pro, — 192)" (Puy — 719)" (5) 
(P2u)+ Pu, + Ug)"s-"s . 


This procedure does not, in fact, throw away any information about P (apart from that 
resulting from using such approximations as equation (3)) and Leslie has shown (results not 
yet published) that the joint estimation of P and the Ns leads to the present method being 
used to estimate P first and the N’s afterwards. Of course, if some restraint is placed on the 
N;,, as, for example, taking them to be all equal, or declining exponentially, some small, 
but probably very small, loss of information will result from the use of the present theory. 

When more than three samplings are involved, the maximization of the likelihood becomes 
a fairly lengthy numerical job and various methods of grouping terms together have 
therefore been proposed. Such methods may throw away information, but do make the 
calculations easier. Before discussing them we consider the two other models. 


THE PROBABILISTIC MODEL 


We now suppose that P really is a probability of survival. Taking it to be the same for each 
animal, the population at sampling stage | may be written 

Ra, 

U,, 








184 The estimation of death-rates from capture-mark-recapture sampling 


where A, is the total number of animals in the population with the mark 0 and U, = N,— Ry, 
is the rest. Ro, is a random variable distributed in a binomial distribution with total Ry and 
expectation PR,. Then, the sample taken at time ¢, isa random sample out of this population. 
A similar situation exists at later times. Thus in this model probability enters in two ways: 
first in the probability distribution of the survivors in each marked class, and second in the 
randomness of the sample taken at each time ¢;. This set-up, although providing a real 
probability model and being much closer to what is actually happening in practice, results 
in a very much more complicated theory which I have not attempted to work out. However, 
it does seem intuitively clear that the variance of the estimator obtained will be larger than 
for the deterministic model. This suggests that in actual practice the use of the estimate of 
variance provided by the deterministic model will suggest a greater practical accuracy than 
is justified. Although the sampling experiments carried out by Leslie & Chitty use what is 
here called the ‘semi-probabilistic’ model the same point should hold. It is therefore 
somewhat curious that in actual fact their estimates of variance, based on the hypothesis 
of a deterministic model, are very close to the observed variance of the estimators in their 
sampling experiment. 


THE SEMI-PROBABILISTIC MODEL 


One of the difficulties in setting up the probabilistic model is that there is a non-zero, albeit 
very small, probability that so many animals die that there are less than R; left at time t;. 
R,; cannot therefore be prescribed before the experiment begins. Leslie & Chitty avoid 
this difficulty by altering the meaning of P. Instead of P representing a fixed proportion 
surviving in each class (as in the deterministic model), or a probability of each individual 
surviving, they suppose that it represents a fixed proportion of the total population which 
survives from ¢; to ¢;,,, but that the N,P individual animals which do survive are chosen 
at random without reference to their markings. Unless the total of marked animals in the 
population is both small and at the same time a large fraction of the total number, the 
actual numbers surviving in each marked class will have probability distributions which 
will be very closely approximated by those of the probabilistic model. The advantage of this 
scheme in an experiment with an urn and counters is that it is easily carried out and can 
be arranged so that at each time ¢; there are always more than R; animals in the population 
to be sampled. As pointed out above the introduction of this new element of randomness 
should result in our estimator of P having a variance which is inflated above that of the 
deterministic model. It is possible that the reason why this is not observed in Leslie & Chitty’s 
experiments is that the scale of the experiment is not large enough for the estimator of 
variance to be really accurate. 


METHODS OF GROUPING 


We now return to the deterministic mode and suppose that the probability of the observed 
values is given by (4), multiplied by (5) if sampling at ¢, is included, and by further factors 
if necessary. Consider first the case where samplings at to, t, and t, only are taken. To estimate 
P the natural thing to do is to take the logarithm of (4) and maximize it with respect to 
variations in P. Ignoring the difficulties resulting from the approximate nature of the 
model, this method is clearly most efficient, at least asymptotically, but in practice the 
calculations quickly become lengthy if four or more samplings are used. It has therefore 
been suggested that before writing down the formulae for the likelihood, the various r’s 








P. A. P. Moran 185 


should be grouped together and two such methods of grouping have been used. The first 
of these (method A), used by Jackson (1939, 1948) and Fisher & Ford (1947), is the simplest 
numerically but appears to throw away some information. Moreover, its logical basis is 
very obscure, and this obscurity was the principal reason for writing the present paper. 


























Table 2 
kor = "or 8) =hy 
koe = Tore +72 ; 
hig = Tore tT 12 82 = hoe t+hie 
hos = ores +7013 + 023 + "oa 
Ais = o123 + 7013 + Mea tlis P 
kos = Toie3 + 7023 + "123 + 723 83 = kog thig thes 
Table 3 
ty . te ty 
PR, ke: PR, hes P°R, kos 
PR, kis F*R, Ress 
PR, kes 
PR, 8; PPR +R, 85 P3R,+P?R,+PR, 85 

















toughly speaking, the idea is to consider the distribution of the individual marks in the 
observed sample, rather than the distribution of the marked individuals. The r’s are 
grouped into k’s in the manner shown in Table 2, the w’s are ignored, and we set up a joint 
distribution for the k’s From this we see that 4,; is the number of animals in the sample taken 
at time ¢; which have a mark put on at time ¢;. The expected number in the population at 
time ¢; with this mark is P/-‘R;, so that we may now set out a scheme as in Table 3, 
following Leslie & Chitty (1951, p. 273). The sj are the total number of marks corresponding 
to to, ...,¢;_; found on the animals at the sampling at time ¢;. The simplicity of this table 
compared with Table 1 is one of the reasons for the attractiveness of method A. Leslie & 
Chitty then write for the log-likelihood (at time ¢, say) expressions such as 


ko: log P — 83 log (PR, + R,), ‘(6) 


a factor P and a constant having been omitted. This is apparently the method used by 
Fisher & Ford and amounts to assuming that for constant 83, kg is distributed binomially 
with probability PR, 


P®R, + PR,” (7) 


We are thus trying to infer the value of P by referring the observed sample values to 
a hypothetical population of all samples at time ¢, which give the same total 83 of marks, 
and so we are dealing with a population of marks instead of a population of marked animals. 
It is nevertheless puzzling that we can treat Ko as-a binomial variate, even approximately, 
when we notice that kg and ky, = 83 — Kg have the quantity 79). in common. 














186 The estimation of death-rates from capture-mark-recapture sampling 


To see what is happening we return to a simpler case. We suppose P = 1 and N, = N, = N,j. 
We can then take the joint distribution of r9,, 7912, 792 and 71, to be given by the distribution 
(2) which is exact when P = 1. It will now be shown that if we first average over all possible 
values of ky,, keeping ko. and k,, fixed and then consider the joint distribution of ko, and ky, 
(i.e. we consider the whole distribution at times t, and t, and not just f, alone) then they are 
independently distributed in hvpergeometric distributions with the expectations based on 
Table 3 and with P = 1. We shall also see that this result is, in fact, almost obvious 
intuitively. 

Now if ko. and k,,. are thus independently distributed and if their expectation is fairly 
small compared with R,, they can be regarded as approximately distributed in Poisson 
distributions. Assuming this, the conditional distribution of one of them, given that 
the sum of the two is fixed, is a binomial distribution as assumed in the expression for 
likelihood (6). For suppose x and y are variates independently distributed in Poisson 
distributions with means A and y. The joint probability of x and y is 





and the conditional distribution of x, given that x+y = N, is 
ie meg ie ana ep | eg 
x!y!\ mo 3! N—s!) s})\A+p) \Atyu 

The Poisson distribution is the only distribution for which this is true. 

In fact it can be proved (Moran, 1952) that if and y are two independent variates with 
distributions on the non-negative integers, and if the distribution of x given that of x+y is 
binomial, then, with the exception of a trivial case, x and y must each be distributed in 
Poisson distributions, so that the Poisson distribution is a necessary condition for the above 
formula to hold exactly. 

Now taking P = | and N, = N, = N, = N, say, the exact probability of the observed 
values 191, 719 Tog @N 749 is 


(Z): (2) (*) 4 - ) e te ’ _ oe as (* —R,-—R,+ ra) 
k, ky Yo) \Ry To To2 ‘2 /\o Ug 


” “ik ed { R,!N—R,! |, (8) 


— F 
kh, Rk, \ro2! "12! Tore! Re—1o2—112—Tor2! 


where A = {Ro aes To. = Toe! R, _ To = T19! To. ~ Tors! N = Ry = Rk, + To = U,!}-1. 





We now have to sum over all values of r9,, keeping roo, 712 And 7p; fixed. It is easy to see that 
the range of permissible values for ry, is given by 

Tore <7, < Min (AR, — 719, Ro— og) = m (say), 
both these limits being attainable. Summing over these values we have 


YA = {Ry—Poy2— Ma! NR, = Ry trot hal} (™ eer wah > Hatton be 


Yor —"o12 Ro—Tor— "oe 


By he) * ovilysolligt bn) 


Ro—To— or 


{Ry — Iyg! N — Ry — Rot byg!}E ( 


Yor — "ore 
™ (R,—k,\(N—R,—R.+k 

{R,—k,o!N—-R,— Ra +k SF " ")( 1— Atg ") 

{R, 12 1 ot Ky9'! = ( . ined 


{R, — kyg! N — Ry — Ry + kyg!}" 


3) 


P. A. P. Moran 187 
multiplied by the coefficient of 2¥o—*o: in 


(1 ris x) Pi-*iz (1 + x)N-Ri-Retkis ae (1 2 x)N-R:, 


This coefficient is ~ —R, 


Ro—ko 3 


The joint probability of the observed values at the third sampling, when averaged over all 
values at the second, can therefore be written 


(r,) (x) 
R,| N — R,! N -R,! |. 


Woz! "12! Tore! He — "2-712 — "ore! Ro — koe! N — Ro — Re + kog! Ry — yg! N — Ry — Ry + hye! 

(9) 

We now want to keep ko, and &,, fixed and sum (9) over all possible values of 79,2, which can 
clearly take all values between 0 and min (kg, £,2) inclusive. Consider therefore the sum 





min (Koa, Ki) min (Koz, ki2) k R an k 
7 re ee. =e ay = 02 2— Xoo 
x {Yo2! "12! Tore! Re—o2—T12—Tor2!} > = {koe! Re— Koe!} i ( ‘)( . 


Tor Tor=0 You ky ra To 


The sum is the coefficient of «12 in the expression 


(1+ 2)Ko2 (1 + x)/2-Kos, 


and so the above is equal to (Koz! Re — Ke!) ( ) . 
12 


Inserting this in (9) we get 








sy ey Ry! N — Ry! Rp! N — R,! 
x N -7¥ R, — R, + hie! k, 3 ky»! hye! Koo! Rk, = koe! 
¥ (2)"| R,!N-R,!R,!N-R;! \ 
7 R, Ro — koe! N — Ryo — Rot koe! Ry — ky! N-R,-— Ret kyo! Ry — ky! ky! koe! Re — ko!) 
(a) (es) eee) le) (aye, an 
" R, koe Ry — koe ky R,— ky, 4 


which shows that ko. and k,, are independently distributed in hypergeometric distributions. 
It must be noticed that this result has been reached by averaging out the results of the 
sampling at ¢, and that we are not here «dealing with a conditional distribution at time ft, 
given that at time ¢,. The result expressed by (11) can in fact be seen intuitively. For the 
process of three successive samplings of the same N animals can be regarded as three inde- 
pendent ‘actions of the experimenter in each of which a certain mark is put on numbers 
R,, R, and R, of the animals selected at random. 

Consider those animals which were selected and marked at time ¢,. At times fy and 
f;, marks ‘0’ and ‘1’ are put on Ry and FR, animals, respectively, in an independent and 
random manner. Then ky, which is the number of animals in the sample at time ¢, which 
possess the mark ‘0’, and may or may not possess the mark ‘1’, is clearly distributed in 
a hypergeometric distribution which is independent of the similar distribution of kj). 














188 The estimation of death-rates from capture-mark-recapture sampling 


It is difficult to see how to give an exact mathematical discussion of the case where P + 1. 
Presumably one might take (3) and sum out 79, and then 79,5, but this seems to be very 
difficult mathematically. However, it seems intuitively clear that kp, and k,, will still be 
independent. 

We can clearly extend the above type of argument to discuss the joint distribution of 
kog, ky3 and ky3. Here we meet another and more important difficulty in this type of grouping. 
For (taking the case P = 1) the joint distribution of kos, k,3 and k,, can be shown to consist 
of the product of three independent hypergeometric distributions, when 19), 719, 792 424 Tyo 
are averaged out. This means that even if we are justified in using a multinomial distribution 
for ko3, k3 and ky, (as an approximation) we are not considering a distribution at time f, 
conditional on what has already happened, but are averaging over all possible previous 
results. But as can be seen from the discussion of method A given by Leslie & Chitty, this 
method involves writing down the part of the likelihood at time ¢, as if it did not depend on 
the previous results and so we are not using the true likelihood of the whole experiment. 
This introduces a further approximation and possibly a bias of unknown extent into 
method A. 

Thus we may sum up as follows: (a) For reasons of mathematical convenience it appears 
necessary to use a model which does not take account of some randomness in the situation 
and therefore estimates of variance are likely to be less than the true values by an unknown 
amount. (b) Methods of grouping can be used to cut down the computational labour but 
of these the method used by Jackson and by Fisher & Ford throws away information and 
has other theoretical objections. The method used by Leslie & Chitty is fully efficient, 
under the assumption of the deterministic model, and appears to be well worth the extra 
labour required. 


REFERENCES 


FisHer, R. A. & Forp, FE. B. (1947). Heredity, 1, 143. 
Jackson, C. H. N. (1939). J. Anim. Ecol. 8, 238. 

Jackson, C. H. N. (1948), Ann. Eugen., Lond., 14, 91. 
Lesuir, P. H. & Cuitrry, D. H. (1951). Biometrika, 38, 269. 
Moran, P. A. P. (1952). Proe. Camb. Phil. Soe. (in the Press). 


+1, 
very 
ll be 


n of 
ing. 
sist 
l To 
tion 
le t, 
ious 
this 
lon 
ent. 
into 


ars 
‘ion 
wn 
but 
and 
nt, 
‘tra 


[ 189 ] 


MISCELLANEA 


A note on the design problem 
By K. D. TOCHER, Imperial College, London 


If an experiment gives rise to a series of uncorrelated observations of equal variance o?, denoted by the 
vector y, and we have &(y) = a0, where a is a matrix of known constants and full rank, and @ is a vector 
of unknown parameters, then Markoff’s theorem gives the minimum variance unbiased estimates of the 
parameter 0 as G= (a’a)-1a’y and ¥ (@) = (a’a)- 6°. 

The central problem of design in experimentation is the choice of the elements of a to minimize the 
diagonal elements of (a’a)-!. The well-known solution of this problem is that a shall have orthogonal 
columns. It may be of some interest to give a short and simple proof of this which is independent of 
Hadamard’s theorem or the ideas of correlation used in previous proofs. 

The proof hinges on the triangular resolution of a positive definite symmetric matrix (Turing, 1948). 
This solution forms the basis of several methods of inverting matrices rapidly gaining popularity, but 
its value as a technique in theoretical problems concerning matrices does not seem to be sufficiently 
realized. 

It is clear that if the elements of a are allowed to increase indofinitely, then the elements of (a’a)-1 
will decrease indefinitely, and so some restraint on the elements of a must be imposed to ensure a 
realistic solution. We shall follow customary practice and assume that this restraint fixes the sum of 
squares of the 7th column of a as ¢;. 

Let u be an upper triangular matrix (u,;; = 0 if ¢>7) such that (a’a) = u’u. 

Since (a’a) is positive definite, u is non-singular and has an inverse v which is upper triangular. Clearly 
vi; = 1/u,;, and (a’a)-! = vv’. Suppose the ith diagonal element of (a’a)- is a,, then 


a, = Dey dvi, = lus, > 1/Duj, = 1/c;. 
j 


For a, to achieve its lower bound 1/c; both inequalities must degenerate to equalities, and for this to 
occur, for all 7, all the elements v;; with 1+7 must vanish. This will establish the first in each pair of 
equalities required. In that case v will be diagonal, and hence so will u, showing that the second of each 
pair of equalities also will be satisfied. 

Thus the condition to achieve the lower bounds simultaneously is that a’a is diagonal, or equivalently, 
that a has orthogonal columns. 


REFERENCE 
Turina, A. M. (1948). Quart. J. Mech. Appl. Math. 1, 287. 


Tables of percentage points of the ‘Studentized’ extreme deviate from the sample mean 


By K. R. NAIR 
Forest Research Institute, Dehra Dun, India 


Denote by x)... 2(,) @ rendom sample of n observations drawn from a normal population with standard 
deviation 7. Let x, ... x, be the same sample arranged in ascending order of magnitude so that z, is the 
rth ranked (or ordered) variate in the sample {2p}. 

If an estimate s, of the unknown @ is available with v degrees of freedom independent of the sample 
{xp} the present author suggested the use of (x, —Z)/s, or (—2,)/8, as a test criterion for a single outlier 
XL, (or x). In Tables 6 A and B of the author’s (1948) paper the lower and upper 5 and 1 % points of the 
distribution of the ‘Studentized’ extreme deviate, (.c, — Z)/s, or (Z —2,)/8, were obtained forn = 3 to 9and 
for selected values of vy> 10. Those tables have now been extended to cover a few more per cent points, 
namely, 10, 2-5, 0-5 and 0-1. The values for ail the six points, including the 5 and 1 % points previously 
published, are given in Tables 1 A and B below. 

The method used for constructing these tables and examples illustrating the use of the ‘Studentized’ 
extreme deviate in the analysis and interpretation of designed experiments have been given in the 
author’s (1948) paper to which reference is invited. 





190 


Miscellanea 





Table 1A. Lower percentage points of the ‘ Studentized’ extreme deviate (x, —Z)/s, or (—2,)/s,,. 







































































































































































n 3 4 5 6 » 8 9 3 4 5 6 7 8 9 
v 10 % points 5 % points 
10 | 0-29 | 0-45 | 0-57 | 0-67 | 0-74 | 0-81 | 0-87 | 0-20 | 0-35 | 0-46 | 0-55 | 0-62 | 0-69 | 0-74 
15 | 0-29 | 0-45 | 0-57 | 0-67 | 0-75 | 0-82 | 0-88 | 0-20 | 0-35 | 0-46 | 0-55 | 0-63 | 0-70 | 0-75 
30 | 0-29 | 0-45 | 0-58 | 6-67 | 0-76 | 0-83 | 0-89 | 0-20 | 0-35 | 0-46 | 0-56 | 0-64 | 0-70 | 0-77 
co | 0-29 | 0-45 | 0-58 | 0-68 | 0-76 | 0-84 | 0-90 | 0-20 | 0-35 | 0-47 | 0-56 | 0-65 | 0-72 | 0-78 
2-5 % points 1 % points 
10 | 0-14 | 0-27 | 0-37 | 0-46 | 0-53 | 0-59 | 0-64 | 0-09 | 0-19 | 0-29 | 0-37 | 0-43 | 0-49 | 0-54 
15 | 0-14 | 0-27 | 0-37 | 0-46 | 0-54 | 0-60 | 0-66 | 0-09 | 0-19 | 0-29 | 0-37 | 0-44 | 0-50 | 0-56 
30 | 0-14 | 0-27 | 0-38 | 0-47 | 0-55 | 0-61 | 0-67 | 0-09 | 0-20 | 0-29 | 0-38 | 0-45 | 0-51 | 0-57 
© | 0-14 | 0-27 | 0-38 | 0-48 | 0-56 | 0-62 | 0-69 | 0-09 | 0-20 | 0-30 | 0-38 | 0-46 | 0-53 | 0-59 
0-5 % points 0-1 % points 
10 | 0-06 | 0-15 | 0-24 | 0-31 | 0-38 | 0-43 | 0-48 | 0-03 | 0-09 | 0-16 | 0-22 | 0-28 | 0-33 | 0-37 
15 | 0-06 | 0-15 | 0-24 | 0-32 | 0-38 | 0-44 | 0-50 | 0-03 | 0-09 | 0-16 | 0-22 | 0-28 | 0-34 | 0-39 
30 | 0-06 | 0-15 | 0-24 | 0-32 | 0-39 | 0-45 | 0-51 | 0-03 | 0-09 | 0-16 | 0-23 | 0-29 | 0-35 | 0-40 
© | 0-06 | 0-16 | 0-25 | 0-33 | 0-40 | 0-47 | 0-53 | 0-03 | 0-09 | 0-16 | 0-23 | 0-30 | 0-36 | 0-41 
Table 1B. Upper percentage points of the ‘ Studentized’ extreme. deviate (2, —Z)/s, or (—2,)/8,. 
n 10 % points 5 % points 
v 3 4 | 5 6 7 8 9 3 4 5 6 7 8 9 
—|—} | 
10 | 1-68 | 1-93 | 2-11 | 2-25 | 2-36 | 2-46 | 2-54 | 2-02 | 2-29 | 2-49 | 2-63 | 2-75 | 2-85 | 2-93 
11 | 1-66 | 1-91 | 2-08 | 2-21 | 2-32 | 2-42 | 2-49 | 1-99 | 2-26 | 2-44 | 2-58 | 2-70 | 2-79 | 2-87 
12 | 1-65 | 1-89 2-05 | 2-19 | 2-29 | 2-38 | 2-46] 1-97 | 2-22 | 2-40 | 2-54 | 2-65 | 2-75 | 2-83 
13 | 1-63 | 1-87 | 2-04 | 2-16 | 2-27 | 2-36 | 2-43 | 1-95 | 2-20 | 2-38 | 2-51 | 2-62 | 2-71 | 2-79 
14 | 1-62 | 1-85 | 2-02 | 2-14 | 2-25 | 2-33 | 2-41 | 1-93 | 2-18 | 2:35 | 2-48 | 2-59 | 2-68 | 2-76 
15 | 1-61 | 1-84 | 2-00 | 2-13 | 2-23 | 2-31 | 2-30] 1-92 | 2-16 | 2-33 | 2-46 | 2-56 | 2-65 | 2-73 
16 | 1-61 | 1-83 | 1-99 | 2-12 | 2-22 | 2-30 | 2-37 | 1-90 | 2-14 | 2-31 | 2-44 | 2-54 | 2-63 | 2-70 
17 | 1-60 | 1-82 | 1-08 | 2-10 | 2-20 | 2-28 | 2-35] 1-89 | 2-13 | 2-30 | 2-42 | 2-52 | 2-61 | 2-68 
18 | 1-59 | 1-82 | 1-97 | 2-09 | 2-19 | 2-27 | 2-34] 1-88 | 2-12 | 2-28 | 2-41 | 2-51 | 2-59 | 2-66 
19 | 1-59 | 1-81 | 1-97 | 2-09 | 2-18 | 2-26 | 2-33 | 1-87 | 2-11 | 2-27 | 2:39 | 2-49 | 2-58 | 2-65 
20 | 1-58 | 1-80 | 1-96 | 2-08 | 2-17 | 2-25 | 2-32 | 1-87 | 2-10 | 2-26 | 2-38 | 2-48 | 2-56 | 2-63 
24 | 1-57 | 1-78 | 1-94 | 2-05 | 2-15 | 2-22 | 2-29 | 1-84 | 2-07 | 2-23 | 2-35 | 2-44 | 2-52 | 2-59 
30 | 1-55 | 1-77 | 1-92 | 2-03 | 2-12 | 2-20 | 2-26 | 1-82 | 2-04 | 2-20 | 2-31 | 2-40 | 2-48 | 2-55 
40 | 1-54 | 1-75 | 1-89 | 2-01 | 2-09 | 2-17 | 2-23 | 1-80 | 2-02 | 2-17 | 2-28 | 2:37 | 2-44 | 2-51 
60 | 1-52} 1 1-87 | 1-98 | 2-07 | 2-14 | 2-20 | 1-78 | 1-99 | 2-14 | 2-25 | 2-33 | 2-41 | 2-47 
120 | 1-51 | 1-71 | 1-85 | 1-96 | 2-05 | 2-12 | 2-18] 1-76 | 1-97 | 2-11 | 2-21 | 2-30 2-37 | 2-43 
co | 1-50 | 1-70 | 1-83 | 1-94 | 2-02 | 2-09 | 2-15 | 1-74 | 1-94 | 2-08 | 2-18 | 2-27 ! 2-33 | 2-39 



























































Miscellanea 191 



























































| — eS 


2-5 % points 1 % points 
5 6 7 8 9 3 4 5 6 7 8 9 
2-84 | 2-99 | 3-10 | 3-20 | 3-28 | 2-76 | 3-05 | 3-25 | 3-39 | 3-50 | 3-59 | 3-67 
2-78'| 2-93 | 3-04 | 3-14 | 3-22 | 2-71 | 3-00 | 3-19 | 3-33 | 3-44 | 3-53 | 3-61 
2-74 | 2-88 | 2-99 | 3-08 | 3-16 | 2-67 | 2-95 | 3-14] 3-28 | 3-39 | 3-48 | 3-55 
2-70 | 2-84 | 2-95 | 3-04 | 3-12 | 2-63 | 2-91 | 3-10 | 3-24 | 3-34 | 3-43 | 3-51 
2-67 | 2-80 | 2-91 | 3-00 | 3-08 | 2-60 | 2-87 | 3-06 | 3-20 | 3-30 | 3-39 | 3-47 
2-64 | 2:77 | 2-88 | 2-97 | 3-04 | 2-57 | 2-84 | 3-02 | 3-16 | 3-27 | 3-35 | 3-43 
2-62 | 2-75 | 2-85 | 2-94 | 3-01 | 2-55 | 2-81 | 3-00 | 3-13 | 3-24 | 3-32 | 3-39 
2-60 | 2:73 | 2-83 | 2-92 | 2-99 | 2-52 } 2-79 | 2-97 | 3-10 | 3-21 | 3-29 | 3-36 
2-58 | 2-71 | 2-81 | 2-89 | 2-97 | 2-50 | 2-77 | 2-95 | 3-08 | 3-18 | 3-27 | 3-34 
2-56 | 2-69 | 2-79 | 2-87 | 2-95 | 2-49 | 2-75 | 2-92 } 3-06 | 3-16 } 3-24 | 3-31 
2-55 | 2-67 | 2-77 | 2-86 | 2-93 | 2-47 | 2-73 | 2-91 | 3-04 | 3-14 | 3-22 | 3-29 
2-50 | 2-62 | 2-72 | 2-80 | 2-87 | 2-43 | 2-68 | 2-85 | 2-97 | 3-07 | 3-15 | 3-22 
2-46 | 2-58 | 2-67 | 2-75 | 2-82 | 2-38 | 2-62 | 2-79 | 2-91 | 3-01 | 3-08 | 3-15 
2-42 | 2-53 | 2-62 | 2-70 | 2:76 | 2-34 | 2-57 | 2-73 | 2-85 | 2-94 | 3-02 | 3-08 
2-38 | 2-49 | 2-58 | 2-65 | 2-71 | 2-30 | 2-52 | 2-68 |} 2-79 | 2-88 | 2-95 | 3-01 
2-34 | 2-45 | 2-53 | 2-60 | 2-66 | 2-25 | 2-48 | 2-62 | 2-73 | 2-82 | 2-89 ) 2-95 
2-30 | 2-41 | 2-49 | 2-56 | 2-61 | 2-22 | 2-43 | 2-57 | 2-68 | 2-76 } 2-83 | 2-88 
0-5 % points 0-1 % points 























3-52 | 3-65 | 3-76 | 3-85 | 3-92 | 3-54 | 3-84 | 4-04 | 4:17 | 4-28 | 4:35 | 4-40 
3°46 | 3-60 | 3-71 | 3-79 | 3-86 | 3-49 | 3-80 | 3-99 | 4-12 | 4:23 | 4:30 | 4-36 
3-41 | 3-55 | 3-66 | 3-74 | 3-81 | 3-45 | 3-75 | 3-94 | 4-07 | 4-19 | 4-26 | 4-31 
3-37 | 3-50 | 3-61 | 3-70 | 3-77 | 3-41 | 3-71 | 3-90 | 4-03 | 4-14 | 4-22 | 4-28 
3°33 | 3-46 | 3-57 | 3-66 | 3-73 | 3-38 | 3-67 | 3-86 | 4:00 | 4:10 | 4-18 | 4-24 






































| 
3-16 | 3-29 | 3-39 } 3-48 | 3-55 | 3-23 | 3-51 | 3-70 | 3-83 | 3-93 | 4-01 | 4-08 
3-09 | 3°22.) 3-32 | 3-40 | 3-47 | 3-16 | 3-44 | 3-62 | 3-75 | 3-85 | 3-93 | 4-00 
3-02 | 3-15), 3-25 | 3-32 | 3-39 | 3-08 | 3-36 | 3-53 | 3-66 | 3°76 | 3-84 | 3-90 
2-96 | 3-08 3-17 | 3-25 | 3-31 | 3-01 | 3-27 | 3-44 | 3-57 | 3-66 | 3-74 | 3-81 
2-89 | 3-01 | 3-10 | 3:17 | 3-23 | 2-93 | 3-19 | 3-35 | 3-47 | 3-56 | 3-64 | 3-70 
2-83 | 2-94 3-02 | 3-09 | 3-15 | 2-85 | 3-10 | 3-26 | 3-37 | 3-46 | 3-53 | 3-59 
2:76 | 2-87 | 2-95 | 3-02 | 3-07 | 2-78 | 3-01 | 3-17 | 3-28 | 3-36 | 3-43 | 3-48 

i 











REFERENCE 
Nair, K. R. (1948). Biometrika, 35, 118, 











192 . Miscellanea 


Extended and corrected tables of the upper percentage points 
of the ‘ Studentized’ range 


CoMPUTED By Joyce M. MAY 


EDITORIAL 


1. Need for revised tables. Denote by x,,2%2,.-.,%, &@ random sample of n observations drawn from 
# normal population with standard deviation o and arranged in ascending order of magnitude; then 
the range in the sample is given by w = x,—2,. Denote further by s? a ‘mean-square’ estimate of 0, 
independent of w, and based on v degrees of freedom so that vs?/o? follows the x? distribution for 
v degrees of freedom. Then the studentized range is defined by 
q = w/s = (_—2,)/8. (1) 
When, during the war, tables of the upper percentage points of g wore computed (Pearson & Hartley, 
1943) from those of the probability integral of w (Pearson & Hartley, 1942), an expansion of the 
‘Studentized’ integral in powers of y-! was used (Hartley, 1944). Since it was realized that for small vy and 
large q this expansion would break down, these tables were not extended to values of v below 10, and for 
v> 10 percentage points with values exceeding 6 were not regarded as reliable (Pearson & Hartley, 1943, 
p- 90, lines 25-6). Recently, Prof. J. W. Tukey has drawn attention to the possibility of a breakdown in 
the expansion at the larger values of qg, resulting in errors of a magnitude far excecding that anticipated 
by these warnings. This query has prompted us to arrange for a recomputation of the upper percentage 
points of g by the considerably more laborious but numerically exact method of quadrature described 
below, which, as will be seen from the attached tables, confirms Prof. Tukey’s suspicions. In view of the 
more recent applications of ‘Studentized’ range (Patnaik, 1950; Hartley, 1950; David, 1951), it was 
felt that the recomputation of the more important upper percentage points should cover all small v 
<lown to vy = 1. For these very small values of v an expansion derived by Pillai, and described elsewhero 
in this issue, has been found useful as a check. The upper 5 and 1 % points computed by Pillai for the 
range 1 <v<4,2<n<8 were found to agree with the tabled results of our computations to within a unit 
of the last figure quoted by Pillai. 


. ° ,’ / ‘ 
Table of the upper percentage points of the ‘Studentized’ range q = (a,_,—2)/8 (5 % ports) 


(x = sample size for range; v = degrees of freedom of s) 








v 2 3 4 5 6 7 8 9 10 ll 12 13 14 15 16 17 18 19 20 
1/180 27 328 372 405 43-1 454 473 49-1 506 519 532 543 55-4 563 572 580 588 504 
2) 609 828 9-80 10-89 11-73 12-43 13-03 13-54 13-99 14:39 14-75 15-08 15:38 15-465 15-91 1614 16:36 16-57 16-77 
3} 450 588 683 751 804 S847 885 918 946 9-72 9-95 10-16 10:35 1052 10-69 10-84 10-98 11-12 11-24 
4/393 500 5-76 631 673 706 735 740 783 803 821 837 852 867 880 892 9-03 914 9-24 
5 | 361 454 518 564 599 628 652 O74 693 710 7:25 739 752 7T64 7-75 786 795 804 &13 
6) 346 434 490 5:31 543 580 612 632 649 665 679 692 TOL Tt 724 734 743 751 THO 
7 334 416 4068 506 535 5-59 RO 199 #15 629 G42 OSE Ot O75 ORE 693 7-01 708 Flt 
8) 325 404 453 489 SIT 540 540 77 «65-920 «6-05 618) «6-290 639) 6-48 657) 6650 673 680 RT 
9) 320 395 442 476 502 524 543 5460 5-74 587 598 609 1D 628 636 644 G5l 658 6405 

10 | 315 388 433 466 491 512 530 546 500 572 583 593 6-03 612 620 627 634 G41 647 
11 | 311 382 426 458 482 503 520 535 549 5S6L S571 5-81 590 598 606 6-14 620 627 638 
12} 308 3-77 420 451 475 #495 512 527 540 551 561 5-71 580 588 595 602 609 615 621 
138 | 306 373 #415 446 469 488 505 519 532 543 55 5663 571 579 586 593 600 606 Gl 
14/| 303 3-70 411 441 4464 483 499 5:13 525 536 546 556 544 5-72 5-79 586 592 598 603 
15 | 301 3467 408 437 #4509 478 #494 508 520 531 540 549 557 565 572 579 585 591 5m 
16 | 300 345 #405 434 456 474 490 503 5:15 526 535 S44 552 5-50 546 5-73 679 584 50 
17 | 298 362 402 431 452 470 486 409 G11 521 531 5:30 547 555 541 568 S74 579 5-4 
18 297 361 400 428 449 4467 483 406 5-07 SIT 527 535 543 550 557 563 540 5-74 5-79 
19 296 3:50 398 #426 447 464 4-79 #492 504 514 5:23 5:32 539 546 553 559 & 5-70 5-75 
20) 295 358 306 424 445 462 4:77 490 5:01 511 520 528 5:36 5-43 550 556 561 566 5-71 
WA) 202 353 390 417 437 454 408 481 492 5-01 5-10 5-18 525 532 538 5-44 5-50 5:55 5-59 
30 | 289 348 384 411 430 446 4060 472 483 492 540 SOR 515 521 527 5:33 538 5-43 5-48 
40 | 286 3-44 3-79 444 423 439 452 463 474 482 490 498 5:05 5-11 517 5:22 5:27 5-32 5-36 
60 | 283 340 374 398 416 431 444 455 465 473 481 $88 494 5:00 5:06 511 515 520 5-24 
120 2-80 336 349 392 410 424 436 447 456 464 471 478 464 490 495 500 504 509 5-13 
Cs) 277 332 3463 38 403 417 429 #4439 447 #=455 462 #468 #4474 #480 4 489 493 497 5-01 














we 
bobs 





— 





Miscellanea 193 


2. Method of computation. The computation of the present tables was based on the following 


expression for the probability integral of ‘Studentized’ range at the upper tail: 
= 1 OT; wre 1 

Pr{w/s>q} =c,] [48Z(s)])’- p(qs|n)ds = c, 4—Z|— x — p(w|n) dw, (2) 
0 8 0 q q w 


where Z(s) = (27)-4e-4*, c, = 20-1(4v) {}./(7v)}” and p(w|n) is the upper tail probability integral of 
range in samples of n. The factor 4 has been included within the brackets, [ ], for the computational 
convenience of bringing the maximum of the function inside these brackets near to unity. A five- 
decimal MS. of p(w|n) (of which 4 decimals were published in the form 1—p(w|n) by Pearson & 
Hartley (1942)) formed the basis for the master table of w-1p(w|n) which was computed for 
n = 2(2) 20, w = 0-25 (0-25) 2-0 (0-5) 8-0. 


v 
To form the integrand in (2) the other factor, viz. [sez (-) , which depends on v and q, was 


formed for v = 1(1) 4,6, 8, 12 and 24 and for some 6-8 equidistant values of 1/g, covering the range of 
the upper 5 and 1% points of q which were estimated roughly for this purpose. For each combination of 
these values of n, v and q, the integral in (2) was evaluated by numerical quadrature, using mainly 
Gregory’s formula and utilizing, for small vy, any symmetry or antisymmetry at w = 0 in the integrand 
factors. For each combination of n and pv (forming, as it were, a ‘master grid’ in n and v) the resulting 
integrals were differenced with regard to 1/qg and the upper 5 and 1% point of q found by inverse 
interpolation. These values were checked by differencing both n-wise and v-wise, and the remaining 
percentage points in the present table found by Lagrangian and/or harmonic interpolation in the 
values of the master grid. The resulting tables should be correct to within 3 units of the last figure quoted. 


REFERENCES 
Davin, H. A. (1951). Biometrika, 38, 393. 
Harttey, H. O. (1944). Biometrika, 33, 173. 
Hart ey, H. O. (1950). Biometrika, 37, 271. 
Patnatk, P. B. (1950). Biometrika, 37, 78. 
Pearson, E. 8S. & Harttey, H. O. (1942). Biometrika, 32, 301. 
Pearson, E. 8. & Hartiey, H. O. (1943). Biometrika, 33, 89. 


Table of the upper percentage points of the ‘Studentized’ range q = (%,—-;)/8 (1% points) 


(n = sample size for range, v = degrees of freedom of s) 





2 38 4 5 6 7 8 9 10 ll ° 12 13 14 15 «16 17 18 19 





900 134 164 186 202 216 227 237 246 253 260 266 272 = 277 282 286 291 295 298 


140 18-9 22: 24:7 266 282 29:5 30-7 31:7 3246 334 342 348 35:5 36:0 30:5 75 
5 het 13: ; 


8-26 10-56 12-17 13-34 1425 15-00 15-65 16-20 16-69 17-13 17-53 17-89 18-23 18-54 18-83 19-09 19-33 19-56 
651 808 917 997 10:58 11:10 11-55 11-93 12-26 12-56 12-84 13-09 13:32 13:53 13-73 13-92 14:09 14-25 
5-62 683 7:65 826 873 9-12 9-46 9-76 10-02 10:25 10-46 10-65 10-83 10-99 11-14 11-29 11-42 11-54 
5-24 632 7:03 756 797 831 861 887 910 9:30 949 965 981 9-95 10-08 10:21 10-32 10-43 
494 589 652 698 735 765 791 814 834 852 868 882 896 9:08 920 931 9-42 9-52 
4:74 563 620 663 696 7:24 747 1768 1786 803 818 831 844 855 8466 8-76 886 8-95 
460 542 596 635 666 691 7:13 1733 750 1765 779 791 802 813 823 833 8-41 8-50 
448 5:26 5-77 G14 643 667 688 17:06 722 736 749 760 771 781 791 799 808 815 
439 5:14 -562 5:98 625 647 667 684 699 7:13 725 736 746 T56 1745 17-73 7-81 7-88 
432 504 550 584 610 632 651 667 681 694 706 7:17 7-26 7:36 T44 7:52 760 7-67 
26 406 540 573 5-98 619 637 653 667 679 690 TOL 17:10 7:19 17:27 7:35 742 7-49 
421 489 532 564 588 608 626 641 654 666 677 687 696 705 713 7:20 727 7:34 
417 483 525 556 580 599 616 631 644 655 666 676 685 693 700 707 7-14 7-20 
$13 478 519 549 572 591 608 622 635 646 656 666 674 682 690 697 703 7-09 
410 473 514 543 566 585 601 615 627 638 648 657 666 673 681 687 69t 7-00 
407 470 509 538 560 579 595 608 620 631 641 650 658 665 6-73 679 685 6-91 
405 466 5:05 534 555 5-73 589 602 614 625 634 643 651 653 665 6-72 678 6-84 
402 463 502 530 5:51 569 584 597 609 619 628 637 645 652 659 666 671 6-77 
3-96 454 491 517 537 554 569 581 592 602 611 619 626 633 639 645 651 6-57 
389 445 480 505 524 540 553 565 5-76 585 593 601 608 614 620 626 631 6:36 
382 436 470 493 511 526 539 550 560 569 5-77 584 590 596 602 607 612 6-17 
3-76 428 460 482 499 513 525 536 545 553 560 567 573 5-78 583 588 593 5-98 
370 420 450 471 4067 500 512 521 530 538 544 550 556 561 566 5-71 575 5-79 
364 412 440 460 476 488 499 508 516 523 529 535 540 545 549 553 557 5-61 








Biometrika 39 











194 | Miscellanea 


On the distribution of ‘Studentized’ range 


By K. C. S. PILLAI, University of Travancore, Trivandrum 


1. INTRODUCTION 


Let 2, < x,<...<x, denote an ordered sample of size n and X,, X_,...,Xy4, a second independent 
sample of size y+ 1 from a normal population. Then the ‘Studentized’ range is defined as 


e = - 
v+1 e. 
where w = x,—2, and s? = & (X,—X)?/v. It may be mentioned that q is a particular case of the general 
1 


class of ‘Studentized’ functions discussed by H. O. Hartley (1938). The usefulness of g in practical 
probloms has been illustrated by Newman (1939), and by Pearson & Hartley (1943). Newman (1939) 
obtained by quadrature, using the approximate probability law of w due to Pearson, the 5 and 1% 
upper probability levels of q for small values of n and v>5. Pearson & Hartley (1943) revised the table 
of probability levels given by Newman with the help of the exact tables of the probability integral 
for w (Hartley, 1942; Pearson & Hartley, 1942); they have calculated the upper and lower percentage 
points of g for values of n ranging from 2 to 20 and v> 10 and also the probability integral of qg for the 
same values of m and v based on certain results obtained by Hartley (1944). In the present paper a 
study of the distribution of q has been based on tho distribution of semi-range developed as a series 
(Pillai 1948, 1951) and the percentage points have been calculated for small values of n and », to fill the 
gap in the tables prepared by the former authors. The lower percentage points are given below for 
values of n ranging from 2 to 8 and v < 10; the upper percentage points for the same values of n and vy <5 
were also calculated.* 


2. THE DISTRIBUTION OF SEMI-RANGE 


It has been shown (Pillai, 1951) that the distribution of semi-range, W = 4(2,—2,) can be obtained in 
the form 


p(W) = kWn-2 e-tin+ a? Fr (1) 
n—1 2\ Kn-v 
h k=———— [- 
os nt 1(5) (=) 
and F = CF) +CW2+CPW'+..., (2) 


the C’s being functions of n. The C™ coefficients for values of n ranging from 3 to 8 have been calculated 
and given in Table 1. 


Table 1. C™ coefficients for values of n ranging from 3 to 8 
































In| cp | op op op op op op 

SS ieediieoncat 

| 3 31104 1728 201-6000 6-1714 0-3397 0-0042 

| 4 98304 8192 1911-4667 42-8021 14-8634 | — 0-6523 

5 240000 24000 7920-0000 311-7460 94-8921 | — 4-1893 2-4859 

6 497664 55296 23040-0000 | 1243-4287 398-7097 | — 16-7465 13-7155 

| 7 921984 | 109760 | 54618-6667 | 3565-9259 | 1263-4199 | — 42-1564 48-9035 
8 | 1572864 | 196608 | 113049-6000 | 8374-0445 | 3293-8665 | — 73-9556 | 136-3427 

| 





* (Mr Pillai’s paper was submitted for publication when the more extensive programme of com- 
putation described on pp. 192-3 of this issue of the journal was already in hand. His new upper percentage 
points are not therefore printed separately as they are included in the main body of the tables on 
pp. 192-3. Ep.] 


~ 





) 





Miscellanea 195 


3. THE DISTRIBUTION OF THE ‘STUDENTIZED’ RANGE 
Starting with the distribution law (1) of W and the distribution of s given by 
» 1 
- —__ gpl g—ivs" | 3 
oT)” *° (3) 
putting g = 2W/s in p(W).p(s), and integrating out s in the interval 0 to 00, the distribution of g may be 
easily seen to be of the form 


pi 
P(8) = Som 


kvv @ qnt3i-s Th(n+2i+v—1) 
=—— ” : 4 
P(q) T(4v) bad * Qiin+2i—D (1(n + 4) g? + vpn taite—D (4) 
As the terms in (4) reduce to Beta-functions, the percentage points can be evaluated with the help of 
Tables of the Incomplete Beta-function. 





4. PERCENTAGE POINTS OF q 


Using (4), upper and lower percentage points of g were computed for small values of n and pv, for the 
5 and 1 % significance lovels; the lower percentage points are presented in Tables 2 and 3. 


Table 2. Lower 5 % points of the ‘Studentized’ range 








1 0-11 0-44 0-70 0-88 1-02 1-14 1-23 
2 0-10 0-44 0-72 0-93 1-09 1-22 1-33 
3 0-10 0-44 0-73 0-95 1-13 1-27 1-39 
4 0-09 0-43 0-73 0-97 1-15 1-30 1-43 


5 0-09 0-43 0-74 0-98 1-17 1-32 1-45 
6 0-09 0-43 0-74 0-99 1-18 1-34 1-47 
¢ 0-09 0-43 0-75 0-99 1-19 1-35 1-48 
8 0-09 0-43 0-75 1-00 1-19 1-36 1-50 
9 0-09 0-43 0-75 1-00 1-20 1-37 1-51 








Table 3. Lower 1 % points of the ‘ Studentized’ range 








n 
. 2 3 4 5 6 7 8 
1 0-02 0-19 0-38 0-53 0-65 0-75 0-83 
2 0-02 0-19 0-40 0-58 0-72 0-84 0-93 
3 0-02 0-19 0-41 0-60 0-75 0-88 0-99 
4 0-02 0-19 0-41 0-61 0-77 0-91 1-03 
5 0-02 0-19 0-42 0-62 0-79 0-93 1-05 
6 0-02 0-19 0-42 0-63 0-80 0-95 1-07 
7 0-02 0-19 0-42 0-63 0-81 0-96 1-09 
8 0-02 0-19 0-42 0-64 0-81 0-96 1-10 
9 0-02 0-19 0-42 0-64 0-81 0-97 1-11 
































REFERENCES 


Hart ey, H. O. (1938). J.R. Statist. Soc. Suppl. 5, 80. 

Hart ey, H. O. (1942). Biometrika, 32, 334. 

Harttey, H. O. (1944). Biometrika, 33, 173. 

Newman, D. (1939). Biometrika, 31, 20. 

Pearson, E. 8. & Hart ey, H. O. (1942). Biometrika, 32, 301. 
Pearson, E. 8. & Hartiey, H. O. (1943). Biometrika, 33, 89. 
Pittal, K. C. S. (1948). Sankhyd, 8, 375. 

Pruxal, K. C. 8. (1951). Sankhyd, 11, 23. 


13-2 











196 Miscellanea 


Note on a certain family of discrete distributions 


By J. 8S. MARITZ 
National Institute for Personnel Research, South African Council for Scientific and Industrial Research 
1. We wish to distinguish between ‘addition’ and ‘superimposition’ of probability distributions and 
will define the terms as follows: 


If we have a number of probability distributions p,(2), 2(Xq), ---, then p(x) = p,(x,)+p2(%_) +... will 
be the ‘superimposition’ of these distributions, and p(x) = p(x,+22+...) will be their ‘addition’. 


2. We will consider the distribution obtained by the addition of two correlated Poisson variables, 
since the ‘model’ of §3 is a direct extension of this model to k variates. 
Consider two events A and B of which only the following (mutually exclusive) combinations may 
happen during an interval of time ét: aH 
AB, BA, AB, AB, 
and let the probabilities associated with these possible (joint) events be 
aét, bét, cédt, 1—(a+b+c) dt. 


Let the random variables associated with A and B be x and y, and let the joint probability of A occurring 
x times and B y times during the interval 0 to ¢ be p(z, y, t). Then 


P(x, y,t+dt) = p(x—l,y,t)adt+ p(x, y—1,t) bdt+ p(x—1, y—1,t) cdt+ p(z, y, t) (1—(a+6+c) dt). (1) 


Defining e(0.9nt) = % OFOP(x, ys), a 
we have — 
PrylIy, qt +t) = Dzy(9,, Og, t)O, dt + Bz y(91, Oo, t) 0208 + 6, y(O1, 99, t) 0, 0,c8t 
+ dry(9;, 92, t)(1—(a+6+c) dt), (3) 


16 
so that iz = a0, +b0,+00,0,-—(a+b+0e). (4) 
Whence, letting dt > 0 and since at ¢ = 0, p(0,0,t) = 1 and p(z, y,t) = 0 for x,y +0, 
$2,(9,,92,t) = exp {(a9, + 0, +c0,0,—(a+6+¢)) th. (5) 


Now clearly the total probability of A in é¢ is 
(a+c) dt, 


so that x is a Poisson variable with parameter (a +c) t. This can also be seen by putting 0, = 1 in (5). 
Similarly y is a Poisson variable with parameter (b+ c)t, and cov (x,y) = ct. Putting 0, = 0, = 6 in (5) 
gives the ‘probability generating function’ (p.g.f.) of z= x+y, 


$.(9,t) = exp {(c6? + (a+b)O—(a+b+c))t}. (6) 
A description of the bivariate Poisson distribution leading to (5) is given in Aitken (1944). 


3. As an extension of the model of § 2, consider n events ¢, ¢3, ...,€, of which any combination of the 
type €), &g, &3, 5, -..,€, (for oxample), which contains every e, only once, may occur during the interval of 


,  [n ° cp: : as 
time dt. (These & (") possible combinations are assumed mutually exclusive.) Let the probability 
r=0 

associated with the occurrence of such a combination, denoted by é, say, be a, dt, and let the probability 
of any combination which contains any e, more than once be = O(d#?). 

Let the random variables associated with e, ...e, be x,...2, and denote (x, ...2,) by X. Clearly, the 
total probability of occurrence of e,, say, during dt is (2X, az) ét + O(dt?), where ©; denotes summation 

é 


over ali possible é containing e,, so that x, is a Poisson variable. 

We now wish to determine the joint distribution of the variables z, ... z,. Writing the joint probability 
of e, occurring x, times, e, occurring x, times, etc., during the time interval 0 to ¢ as p(x, ... x,t) we define 
the joint p.g.f. of x, ...x, by 


Pz, ... 0 (8, ... Og, t) = BO, ... Ont) = TOP... Ol" p(x, ... Last). 





4) 


5) 


ol 
18 





Miscellanea 197 


Let (Z) be a vector containing 1’s corresponding to the e, in é and 0’s corresponding to the é, in é. The 
following scheme will make this clear: 





(€) = (€1, Eg, Cys yg --- Eq)s 
‘ . | (7) 
(2) = (1,6, 1, 1,....5 0), 
so that (X)—(%) = (X) = (a, —1, 24, 23-1, 2,-1,...,2,). 
Let 0; denote the product of all the 0; corresponding to the e, in €. We then have 
p(X,t+ dt) = Lp(X, t) agdt+ p(X, t) (1— LD azdt) + O(dt?), (8) 
é é 
where > denotes summation over all combinations é, whence 
P(A, ... In, t+ dt) — B(O, ...9n,t) = DAO, ... On, t) Ogazgdt —X (0, ...0,,t) agdt + O(dt?). (9) 
é é 
; 1 dd(O,...,,t) 
Letting é6t > 0 we have $(0,... Oat) 6 - = % (Oeae— ae), (10) 
and since at t = 0, p(0...0,¢) = 1 and p(x, ... x,,t) = 0 for any a, +0, 
P(O, ...On,t) = exp {t X (Oag—a,)}. (11) 
é 


(11) defines the joint p.g.f. of n variables each of which has a Poisson distribution. As can be seen quite 
easily by putting n = 2 these n variables are in general not independent. 

By putting 0, = 0, =... = 0, = 9 we obtain the p.g.f. of the variable z = 2,+2,+...+2,. We thus 
obtain a distribution which is the addition of a set of correlated Poisson variables as contrasted with the 
super-imposition of independent Poisson distributions which is enccuntered quite eommonly, especially 
in the literature concerning ‘Accident Proneness’. 

From (11), the p.g.f. ¢,(0, t) of z is obtained as 


$,(9, t) = exp {t2(a,0" — a 


= exp {X(a,6" —a,)} (12) 
= 9(9). 
St.) £ 
From (12) we have p(z) = exp - 2 a, | ae (13) 
r=1 s=19° 9,...47,=1 


where 7,+...+7, = z and each 7;>1. More explicitly 
p(0) = e-*, 


p(1) = e*ra,, 


a 
p(2) = Zar (G+), 
af (14) 
p(3) = ete (eB +a,+0), 





at ad ata 
p(4) = eta (@ +3 40,0, +7 42,), 
e / 


ete. 

It should be noted that if n > 00 a certain class of distributions is obtained, and it may be of interest 
to try to determine which distributions belong to this class. Certainly those of which the log ¢(4) can be 
expanded in an infinite Maclaurin series with radius of convergence R> 1, for 


y(0) = log A(9) =0 at O=1, 


so that yoy =- 5 ©, 
r=1 7! 
I(r) Ltr) 
and (8) = exp{¥ (#o_"s)}, (15) 


which is of the same form as (12). 











198 Miscellanea 
The negative binomial distribution belongs to this class as, for this p.d., 
c \? @ \-? 
= |——_ } = ; 
#(9) (<5 i) ( c+ i) 
wh 0, and (0) = plog — lo 1-2.) 
ere ¢, p> 0, an YO) = PRE Pe il 


which can be expanded in a Maclaurin series for 0<c+ 1. 


4. The factorial cumulant generating function of the distribution (13) is given as 


g(T) = log (1 +7) (16) 
=F 3 PA 3 r(r—1)a,+...+7"h,y, (17) 
ral 2!-=9 
so that Kig = 3 r(r—1)...(r—8+1)a, (18) 
T=8s 


and if (x;,)) = M(a,), where M is a matrix, then M is diagonal, and so is M-". This gives some simple 
relations which may be used to estimate the a,, e.g. ifn = 3, 


Oy = Ky — Kia) + dK ig), 
a, = 3K (21 — 3K) 


1 
= 31 Xs 


(19) 


5. Practical considerations. (a) A possible practical application of the distribution (13) is to the 
so-called ‘c chart’ (see Grant, 1946, p. 295). Current practice is to use control limits based on the Poisson 
distribution. This presupposes that the defects of the parts making up the unit occur independently. 
This need not always be so, in which case (13) should be applicable—with the appropriate number of 
parameters. Unfortunately we have no such data to which to apply the theory. 

(6) Itis felt that a distribution of type (13) may be useful in certain problems of absenteeism in which 
the absences of all individuals are not independent events. Again we have no data in support of this 
suggestion, but it is hoped that some will be available soon. 


The author wishes to express his gratitude to the South African Council for Scientific and Industrial 
Research for permission to publish this paper. 


REFERENCES 


AITKEN, A. C. (1944). Statistical Mathematics. University Mathematical Texts, 2. Edinburgh and 
London: Oliver and Boyd. 
Grant, E. L. (1946). Statistical Quality Control. New York and London: McGraw-Hill Book Co. Inc. 


Some properties of runs in smoothed random series 
By ALISON M. GRANT, Department of Meteorology, University oj Melbourne 


The method of smoothing a series by means of a moving average with equal weights is often used to 
reduce short-period fluctuations in series of experimental data. However, in order to draw valid con- 
clusions concerning the data from the series obtained by this process, a knowledge of the effects of such 
a technique on random series is required. The present paper gives the solution of this problem for some 
properties of runs, a run being defined as a series of values showing a continuous increase or decrease. 
In particular, the frequency distribution of runs of different lengths is obtained as a function of the 
extent of the moving average. Amplitudes, i.e. differences between highest and lowest values, of runs 
of different lengths for both random series and derived series of averages, are’also considered. 


Distribution of runs of different lengths 
The distribution of runs in a random series has been studied by Kermack & McKendrick (1937); the 


derivation, in rather a different form, is given here both in order to illustrate the method and because it 
forms part of the study of runs in the smoothed series. 


ail ae 





ie 


of 


2oOoerodr: 2 


2m 





Miscellanea 199 


If the values of a continuous variate x are chosen at random from a population with probability 
distribution function f(x), the function y, defined by the relation 


x 
a= f(x) dx, 


will be distributed in a rectangular distribution in the range (0, 1). There is a one-to-one correspondence 

between values of x and values of y, so that the inequality 7,< x, implies the inequality y, < y2, ete. 
The conditions which must be satisfied for a particular point, say 29, to be the start of an upward run 

of length n, consisting of n consecutive upward steps preceded and succeeded by downward steps, are 


given by Ly > Lg<Xy<Xg<...< Uy >Lyiz, 
or alternatively by Y-1>Yo<Y1 <Yo< +s» <Yn>Ynsr- 


The probability that these conditions will be satisfied is given by 


11 Yn Ws 7U: fl 
I | i) wee I I | dy_,dygdy, ... dYn_1 dy nd ns1» (1) 
Od Ynii dO 0 0 Vo 


n?+3n+1 
The probability of occurrence of a run of length n (up or down) is therefore given by 
_ 2(n?+3n+ 1) 
* (n+3)! 
The result P, = 4 indicates the expected frequency of points which do not start a run, i.e. in which 


L_1<Xy <2, Or X_y>Xy>2,. As the starting point of a run of length n is followed by n—1 such points, 
we have 


which is equal to 


> (n—1)P, = Py, 


n=1 
a result which may be used to determine the expected on of runs 


. : np, |S x P,= 1-5. 


Turning now to the results to be wipe from a moving average of extent a, the typical term z, being 


defined by Ad, = Let Leng t --> + Xe4a-1 


where, as before, the values of x are chosen in a random manner, we find that the inequality z,,, >2, 
requires x,,, > 2, and hence y,,,>Yy,- The conditions which must be satisfied for a run of a certain length 
to occur now concern points which are separated by an interval a. It is convenient therefore to use 
ma +s for the length of the run, where m = 0, 1, 2,...; s = 0,1,2,...,a—1. 

As the form of the conditions which must be satisfied by y for a run of length n (in z) depends on the 
value of s, it is simpler to consider the probability of occurrence of a run of length greater than or equal 


to ma+s. The inequalities 
+ a es Sk <..+ Shere 


imply Ys <Yats < +++ <Ymats 
Ystt SYatstr <+++ <Ymatsti 


Ya-2 <Yea-g <-+++<Yma+a~2 
Y-1 >Ya-r <Yoa-1 <-+-<Ymare-1 >} (2) 
Yo <Ya <Yea *« +» <Ymat+a 
n <Yat1 <Yeati S++» <Ymata+i 





Ys-1 <Yats—1 < Yoats—1 < +++ <Yma+a+s-1° 1 
Each of the first a—s— 1 lines leads, by a process of successive integration, to the probability (m+ 
4 , 1 1 . Rig ° Oe 
thenext line ion? +I) - (m+) 3 while each of the remaining s lines leads to (m+2)! . The probability 
that all of the conditions will be satisfied is therefore 








1 a—s-1 l 8 
feeart faeartateeen lesa ’ 











200 | Miscellanea 


which, when doubled to allow for both upward and downward runs, gives 
2(m +1) 
{(m + 1)!}4 (m+ 2)8+2" 
The probability of occurrence of a run of length ma + s is obtained as the difference between P(n > ma + 8) 
and P(n>ma+s+1). 
The result includes that for the random series, which can be considered as a special case with the 
extent of the moving average equal to unity. For a>1, P, = 4, showing that the number of peaks and 


troughs is reduced from two-thirds to one-half of the total number of points by the averaging process. 
The mean length of runs is now 





P(n>ma+s) = 


(3) 


1 
é(n) = —— = 2, 

(= 
which is independent of the extent of the moving average (provided a> 1). 

Relative frequencies of runs of different lengths are shown in Table 1 for the values a = 1, 3, and 5. 
If ‘runs’ of length 0 are not considered, the frequencies for a = 1 have to be multiplied by 1-5, the others 
by 2; but the numbers as they stand give the expected frequencies in terms of the total length of the 
series. The values for a>9 are derived from the fact that the relative frequencies up to n = a—2 are 

] 
given by pati* The increase in the expected frequencies of longer runs as a increases is clearly shown, 





the value n = a being specially preferred. 


Table 1. Relative frequencies of runs of various lengths ( x 100) 








n a=} a=3 a=5 a>9 
0 33-33 50-00 50-00 50-00 
1 41-67 25-00 25-00 25-00 
2 18-33 8-33 12-50 12-50 
3 5-28 11-11 6-25 6-25 
4 1-15 3-71 2-09 3-13 
5 0-21 16 2-77 1-56 
6 0-03 0-52 0-93 0-78 
7 0-00 0-13 0-31 0-39 
>8 0-00 0-04 0-15 0-39 























Mean amplitudes of runs 


By performing the integration (1) with a fixed value of y, it is possible to obtain the distribution of 
y, for a given value of n. Thus (1) becomes 


Un (Un (Un-s vi fui fl 
I I i) | | | dy, dygdy, --- dyn 2dYn1dYnsrs 
0 0 0 0 0 Yo 

ys*! 


sa ynt? 
which is equal to 








nt! (mtI)t" 


Since y, apart from the conditions imposed by the runs, is distributed in a rectangular distribution in 
the range (0, 1), the conditional distribution of y,, the end-value in an upward run of length n, is given by 


2 fyn** pe 
n d s==—f _-— if n* 
Kyn)dy a n! hay, 


If the distribution of x is of such a form that y can be expressed as a function of x, not involving integrals, 
this leads directly to the distribution of'z,. In any case the expected values of x, and 2, and hence of 
the amplitude, can be found numerically if not in functional form. Thus 


2 (1 fyntt — ynt2 
é(x,) =— - 4 
(=a) =|, n! uy 











where z is expressed in terms of y. 
The expected value of each of the a values of x which go to make up z,, the last value in a run in the 
derived series, is found in a similar manner from the conditions (2). The additional condition for the 








oe ee 





Miscellanea 201 


termination of the run, Ymais>Yma+ars: BPpears at the end of the first line of (2), but it must be borne 
in mind that if s =a—1, this is of the type 


Y-1 > Ya-1 <Yoa-1 < +++ <Yma+a-1 > Ymat2a-1» 
which leads to the same results as the random case with m replacing n. The distributions of the corre- 


sponding y values (apart from constant multipliers) are given in Table 2, together with the number of 
terms which follow the law in question. 





























Table 2 
(i) s<a—1 
No. S(y) 
] (m+ 1) y™—y™"! 
s+ 1 ym 1 
a—s—2 y™ 
(ii) s =a—1 
No. S(y) 
1 (m+ 1) y*ti— ym? 
a-l ymth 
1 1 
For each type, E(x) = i eftunay]| | fly) dy, 
0 


and &(z,) is equal to the average of the a values of &(x). Substitution of the value of x corresponding 
to 1—y leads in a similar manner to &(z,),and hence to the expected value of the amplitude, &(z,,—z,). 

The simplest example is where z is distributed in a rectangular distribution in the range (9, 1). Here 
« = y and the various integrals can be directly evaluated. The variation of mean amplitude with length 
of run is shown by the heavy lines in Fig. 1 for the cases a = 1,3 and 5. The amplitudes are expressed in 
terms of c,, the standard deviation of z, so that they can be compared with corresponding results obtained 
from other parent distributions. The factor a has been introduced in order to give a clearer picture of 
the pattern of variation. 

There is a marked increase in the importance of the longer runs in the smoothed series as compared 
with the random series (a = 1). Thus the averaging process not only increases the expected frequency 
of long runs, but also increases their amplitudes relative to that of short runs. 

The mean amplitudes of runs were also calculated for x distributed in a standard normal distribution. 
Their dependence on the extent of the average and on the length of run is indicated by the fine lines in 
Fig. 1. The values for the smoothed series (a = 3 and a = 5) are practically identical with those found 
from the rectangular distribution. For the random series (a = 1) there is a slight increase in the 
amplitudes of the longer runs over those obtained from the rectangular distribution, due to the removal 
of the absolute limitation on the values of x. 

This comparison suggests that the results obtained from the rectangular distribution give a fairly 
accurate picture of the behaviour of runs in series derived from other parent distributions. It is therefore 
of interest to carry the derivation a stage further in order to obtain the distributions of the amplitudes 
as well as the mean amplitude for given n. 


Distribution of amplitudes 


The joint distribution of y) and y, (for the random series) is obtained by integrating (1) with fixed 
values of y, and y,. The resulting expression 


Un (Un PUn-1 Us (YU: fl 
PP EEG, teatancton.-- en adttnr dine 
0 Vol Vo Vow Vo 


(Yn—Yo)"” < 


is equal to Yn( 1 — Yo) (n-1)! 





















































202 Miscellanea 
As the unconditional distribution of y is rectangular, the joint conditional distribution of yp and y, is 
ony 2 (Yn—Yo)" 
F(Yor Yn) YodYn = > Yui 1 — Yo) ————— dyod yn. 
P,, (n—1)! 
2 _ 
Hence F (Yo: An) dygdA, = p Yor An) (1 Wo =i Wet 
where A, = ¥,—Yo- The marginal distribution of A, is given by 
1 AS- -1 
dA, 14+3A,—342—A3}dA,. 
fAn)d4,= 5 3, lt 
10 
& 
/ “A 
8r A. 
e A av “ 
> 
2 
E 6F LL a=? 
a 
4 
a es ee 
a=1 
———— 
2 | 
° 2 4 6 8 10 


Length of Run 


Fig. 1. Variation of mean amplitude with length of run for a moving average of extent a. 





If x is distributed in a rectangular distribution in the range (0, 1), A, is equal to the amplitude of a run 


of length n and the above function represents its probability distribution. 


In the case of the moving average the distribution of the difference between each pair of y values, one 
of which corresponds to one of the x values forming part of z), the other to one forming part of z,, can 
he found in the same way. The different types of distribution of A which arise in this manner are given 
in Table 3, together with the number of times they occur and also their means and variances. 

For particular values of a and n it is possible to derive the distribution of the amplitude, z, —2), which 
is equal to the mean of the a values of A chosen from the appropriate distributions. Thus for a moving 
average of extent 2, and runs of length 2, so that m = 1 and s = 0, the joint distribution of the two 


values of A isgivenby = (4, A,) dA, dA, = &(1—A?) (1 — 43) dA,d Ay. 
It is readily shown that the distribution function of B = 3(A,+A,) takes the form 

= 3B . 2 5 ’ 

npyap{=* (8B* — 40B* + 15) dB, B<}; 

\ = 44(1- B)*(B?+3B+1)dB, B>}. 


Rectangular distribution of zx; —_______. Normal distribution of x. 


| 





Tee OP PRIEESS 





Miscellanea 203 


It does not seem likely that the amplitude distribution can be given simply in a general form. However, 
as the amplitude is the mean of a independent variates, it will be approximately normally distributed 
with mean and variance found from the a individual values, i.e. 
26 (A) xV(A) 
=e V(z_,—2) = a * 

The numerical values of &(A) and V(A) are given in Table 4 for the first few values of m. They can be 
used to determine the mean and the standard deviation of the amplitude of runs of any given length in 


E (zn — 2) = 











Table 3 
(i) s<a-1 
No. f(A) &(A) V(A) 

2 Am-1_ Amst __m(m+2) __ m(2m*+8m+9) 
a (m+ 1) (m+3) (m+ 1)? (m+ 3)? (m+4) 

.. m+1 m+i __ 2m+}) 

Se a: . m+3 (m +3)? (m+4) 

m-1_ 4m oa eee ea 

a—s—2 A™-1-A aa (m + 2)® (m+3) 











(ii) s=a-1 





No. f(A) @(A) V(A) 




















1 Am-14.3Am—3Amti— amie | __m(m-+ 5m + 5) 2me{m* + 10m? + S4on* + 45m + 32) 
. (m+ 4) (m? + 3m +4 1) (m + 4)? (m+ 5) (m? + 3m-+ 1)? 
1 2(m + 1) 
on Am—Am™t m+ et a OS 
. m+3 (m +3)? (m+4) 





Table 4. Values of €(A) and V(A) (in brackets) for runs of length ma +s. (Unit = standard 
deviation of original series) 























(i) s<a-1 
No. m=0 m=1 m= 2 m=3 
P 0 1-30 1-85 2-16 
(0) (0-71) (0-59) (0-45) 
1-15 1-73 2-08 2-31 
’ (0-67) (0-60) (0-48) (0-38) 
P 0 1-15 1-73 2-08 
Sawn (0) (0-67) (0-60) (0-48) 
(ii)s =a—-1 
No m=90 m= 1 m=2 m=3 
; 0 1-52 1-99 2-27 
(0) (0-72) (0-54) (0-41) 
' 1-15 1-73 2-08 2-31 
— (0-67) (0-60) (0-48) (0-38) 
































204 Miscellanea 


a smoothed random series. Thus for a moving average of extent 4 and runs of length 5, so that m = 1, 


8 = 1, we have Mean amplitude = }(2x 1-304 1-734 1-15) = 1-37, 
Variance of amplitude = 7¢(2 x 0-71 + 0-60 + 0-67) = 0-168, 
Standard deviation = 0-41. 


Both the mean and the standard deviation so obtained are expressed in units equal to the standard 
deviation of the original random séries. For the random series a = 1, s = 0, so that the amplitude 
distributions, means, and variances are those appropriate to the single term in (ii). 


Discussion 

It has been shown that the moving average process applied to a random series gives rise to a series 
which is far from random in character. In particular, runs of short length are suppressed both in number 
and in amplitude. Also runs of length equal to the extent of the moving average are increased in number. 
This is evidence of the oscillatory nature of the moving average which has been studied previously by 
means of serial correlation coefficients, for example by Spencer-Smith (1947). The dangers of applying 
the technique when seeking information regarding oscillations in a series must again be emphasized, 
as the appearance of the data in this form is likely to be misleading. 

The results given here may be used as the basis for tests of randomness in a series in which each 
observation depends on average conditions over a period of time which is longer than the interval 
between observations. In such a case the series would be similar to a moving average and should be 
compared with a smoothed random series in order to study its basic properties. The distributions of 
frequencies and amplitudes of runs could be used to decide whether any fluctuations observed can be 
explained in this way. In this connexion it should be noted that the amplitudes of consecutive runs are 
not independent of one another so that the variance of mean amplitude, as calculated, is not strictly 
true. However, on the hypothesis that the series in question is a moving average based on a random 
series, one very abnormal value in the latter will affect only a limited number of terms in the former, 
and hence will affect at most a few runs. The tests suggested would therefore be approximately correct 
provided the length of the observed series is long compared with the extent of the moving average 
suspected. 


In conclusion, I wish to thank Associate-Professor M. H. Belz and Mr U. Radok for their helpful 
suggestions during the preparation of this paper. 
REFERENCES 


Kermack, W. O. & McKeEnprick, A. G. (1937). Proc. Roy. Soc. Edinb. 57, 228. 
SPENCER-SmitH, J. L. (1947). J.R. Statist. Soc. Suppl. 9, 104. 


An approximation to the symmetrical incomplete beta function 
By J. H. CADWELL 


1, SUMMARY 
The following approximation is developed: 
2(p— 1)(2p + 1) 
(4p—1)* 


m\' 3 ‘ 
-=4+(3) *((5-) 4 
The functions ® and ¥ are defined as 
$ e-it ¢ ®\ ei? 
DC =| dt, FC -| (: -5) 
, 0 (27) ts) 0 15] ./(27) 
The latter function is only required to two-place accuracy and a short table is given. The function L(p) 


is also tabled; for p> 3 it can be replaced by unity. This approximation is of use for p = 2, where it gives 
a@ maximum error of 0-00011; for p> 5 it will give five-figure accuracy. 


LApsp) =} +H p{oo+ ¥o), 














ide 








Miscellanea 205 


The function I,( p, p) is needed in evaluating the area under Pearson curves of types II and VII. As 
2p will in general not be an integer, use of the incomplete beta function tables raises difficulties in 
interpolation. Thus for four-place accuracy, third differences in p are needed for small p, while for large 
p second differences in 2 must be used. 

The above method allows the use of four- or five-figure tables of the normal integral with mean 
differences. It will be found that the saving in time on the interpolation involved is more than enough 
to compensate for the auxiliary computations needed. The following relations indicate the range of 
symmetrical Pearson curves covered: 

p>2, 24 <p,<a; 
p>4, 2<,,<44; 
p>6, 23 <~,<3}. 
It will be found that for p = 4, the simpler form 
IAp,p)=4+ OC) 


loads to a maximum error of 0-00045. For p = 6 this maximum error is halved. 


2. DERIVATION OF THE APPROXIMATION 


The probability-density function of the median of a sample of (2p — 1) members from a population with 
distribution function F(z) is proportional to 


{FP (x) (1 — F(x)}}?-* dF (2). 


For a normal parent population Hojo (1931) has shown that this is very nearly normal for quite 
small p; for p = 2 he finds £, = 3-035. Thus we have approximately 


(F(x) (1 — F(x) }}"-1 d F(x) oc e220" dr, 


Thus the substitution t= F(a) 
will give the following approximation: 
Lebpvp) = |e. —ora ae ae" 
Ap,p)=] t(1—t)'dt= ——— dt, 
pe -o v(m) 
m\' 3 4 
It turns out to be better to use t= 3+ (7) ® (4) ¢}. 
3 4p—1 


Expanding the function ® about zero, we find that the integrand is proportional to 


A (p—1)_ e _(p-)) —3¢° 
15(4p—1)8* set fe ' 





Thus we have the approximation 








(p—1) fét® e-i# : (3p—1) (é #& ei? 
(4p—1)3 Jy 15 /(27)  (4p—1)* J 9 105/(27/) 
Here K is a normalizing factor depending on p. Table 1 illustrates the behaviour of the incomplete sixth 
and eighth moments. These values do not differ greatly, and as the coefficients of these incomplete 
moments will be small, especially that of the eighth, we replace the latter by the sixth moment and 
derive 2(p—1)(2p+1) (E ® ent” a| 

(4p—1)4 o 15/(2m) 


Lipp) =4+K {00 - 





LAppr=4+K{ Of) - 


Table 1. Values of standardized incomplete moments 





2-0 2-5 3-0 3-5 4-0 


oS 
-_ 
S 
- 
on 





6th moment 0-00 0-03 O-1l |, 0-24 0-37 0-45 0-49 
8th moment 0-00 0-01 0-04 0-14 0-28 0-40 0-47 






































206 . Miscellanea 
Since K will be very close to unity, this can be reduced to 


2(p-—1)(2p+1 
Lep.p)=4+ Lp){ (g) — "POE, ) 


We now choose L(p) so that I,(p, p) takes the value unity. 





vo}. 


3. NUMERICAL DATA 


Table 2, giving the values of 4’(£), will be found to give sufficient accuracy, as two places in this function 
are quite adequate. More accurate values can be found from the tables of m, in Tables for Statisticians 
and Biometricians, vol. 1 (Pearson, K. 1914). 


Table 2. Values of ‘Y(¢) 





























4 0 1 2 3 4 5 6 7 8 9 

0 | 0-000 | 0-040 | 0-079 | 0-118 | 0-156 | 0-191 | 0-226 | 0-258 | 0-288 | 0-315 
1 | 0-339 | 0-360 | 0-377 | 0-391 | 0-400 | 0-406 | 0-406 | 0-403 | 0-395 | 0-383 
2 | 0-367 | 0-348 | 0-326 | 0-302 | 0-272 | 0-249 | 0-222 | 0-196 | 0-171 | 0-147 
3 | 0-125 0-105 | 0-087 | 0-071 | 0-057 | 0-046 | 0-036 | 0-028 | 0-022 | 0-017 




















The values of L( p) can be found from Table 3. 


Table 3. Values of L(p) 








Pp L(p) Pp L(p) 

2-0 1-00090 3-0 1-00003 
2-2 1-00054 3-2 1-00002 
2-4 1-00023 3-4 1-00001 
2-6 1-00012 3-6 1-00000 
2-8 100006 38 1-00000 




















In order to check the accuracy of the formula, the function was evaluated to five places at some 
percentage points. The results are summarized in Table 4. The last line gives the maximum error of the 
simpler approximation, 


I, p,p)=$+ D(€). 


Table 4. Errors of the approximation 











True value p=2 p=3 p=4 

1-0 0 0 0 
0-995 + 0-00002 — 0-00005 — 000002 
0-99 + 0-00002 — 0-00006 — 000003 
0-975 + 0-00003 — 0-00007 — 0-00003 
0-95 + 0-00006 — 0-00007 — 0-:00003 
0-90 +0-00011 — 0-00006 — 0-00001 
0-75 + 0-00004 — 0-00003 — 0-00001 
0-50 0 0 0 

Maximum error — 0-00205 — 0-00085 — 0-00045 

of simple form 




















_—— 


on 
ins 


1e 





Miscellanea 207 


The method used has been extended to I( p,q) and has been found to give good results for p and q 
modcrately close together. However, the loss of symmetry greatly complicates the work, and the 
approximation developed by Wise (1950) is much more useful. 


4. APPLICATIONS 
The result for the Type II curve is 


Yr m 
y= ve(1 -") » F(x) =1,(m+1,m+1), 


where X = }$(1+<2/a). 


For the Type VII curve we have 
2 


Y =Yo (1 +o) » F(x) = 1x(m—4,m—}), 
a 
where X = $[{1+2/(a? + 2*)84]. 
We note too that I,(p, 4) can be evaluated, since 


I(p,4) = 21 p, Pp), 2z= 1-—(1—2)!. 
For the distribution of ‘Student’s’ ¢ with n degrees of freedom, 


Pr(t<t)) = Iy(4n, 4n), 
where X = 4$[{1+¢,/(n + @3)!). 

A simple consequence of this expression does not seem to have been noticed. Let V denote the ratio 
of two independent estimates of variance, each based on n degrees of freedom. Then the statistic 


jn(Vi— V-2) 


has the ¢ distribution on n degrees of freedom, the probability of the observed discrepancy being exceeded 
in random sampling corresponding to twice the tail area of the ¢ curve. 


The author wishes to thank Dr H. O. Hartley for some suggestions as to the presentation of this 
work. Acknowledgement is made to the Chief Scientist, Ministry of Supply, for permission to publish 
this note. 

REFERENCES 
Hogso, T. (1931). Biometrika, 23, 315. 
Pearson, K. (Ed.) (1914). T'ables for Statisticians and Biometricians, 1. University College, London: 
Biometrika Office. 
Wise, M. E. (1950). Biometrika, 37, 208. 


The distribution of quantiles of small samples 
By J. H. CADWELL 


1. INTRODUCTION 


It is well known that for any continuous population, the quantile of a sample of size n has an 
asymtotically normal distribution. Numerical investigations into the case of a normal parent popula- 
tion reveal the fact that, while the asymptotic mean and variance are only approached slowly, the 
convergence to normality is suprisingly fast. 

E.S. Pearson & Adyanthaya (1928) evaluated the variance of the median for n = 3, 4 and 5. Hojo 
(1931) extended this up to n = 12, and evaluated f, for n up to 8. He also gave a number of results for 
quartiles. K. Pearson (19316) developed another method which he applied to some special cases. Hojo 
fitted polynomials in 1/n to his values of the quantity: 


S.D. median of n values 
.= 





S.D. mean of n vaiues ” 


0-26 . “OF 
and obtained nodd: c,= i -- =A = a = ; 


2 n n2 ns 








mn 0-8261 0-7862 0-3478 0-1304 
neven: ¢,= /[=— + = + 
2 n n? n3 nt 











208 Miscellanea 
In an appendix to Hojo’s paper K. Pearson (1931a) derived the result 


m 0-2690 o07ss 
nodd: c,= _" _ 


n n? 








This gives good agreement for n = 11, but there is a discrepancy at n = 3. Below we obtain approximate 
expressions for c,, and /,, for the median. These are very close to Hojo’s values down to the lowest value 
of n. The method used is also applied to the general quantile of a sample from a normal population. 

While a normal population has been used throughout, the method could be applied to any continuous 
parent distribution without difficulty. 


2. THE MEDIAN WHEN n= 2v+1 


Since the median value exceeds v values and is itself exceeded by v values, its probability density will be 
2v+1 
Cot F(a) (1 —F(a)|r dF (2), 


vip! 


where F(x) is the population distribution function. 
By considering the integral | fexp {-—4(a? + y?)} dxdy, 
taken over the square with vertices (+a, +a), and over a circle of equal area, we derive 
O(a) =4 (1 ~exp- 2)". 


Here (a) is the normal area from 0 to a. Using this result, the quantity 


2vx? v 
exp——— exp—5 dr 


is found to be approximately proportional to the probability density of the median. 
While the expression used for (a) is close for all values of a, I have shown (1951) that a modification 
of the equal areas condition loads to an improved result. 


Adopting an alternative approach, and starting from the series expansion of ®(x) about zero, the 
steps given below follow in order: 








2 a* Tx 
4F (x) {1—F(zx)} = 1-2 {e545 + J 
2(7 — 3) (60 — 77?) 
= e-22'in x4 6 
. ve 372” ss 45n3 
3, 2(7 —3 120 + 77? — 607 
=emtelit ee a rr) a+... 


[t will be found that the coofficiont of x* is —0-0004. If this and higher powers be ignored, the median 
has a probability density proportional to 


2(7 —3)v 


—2v2"in} 
r ” 37? 


a e-'*" dx. 


We write this probability density in the form 


Ae-" (1+ ky*) dy, 


where we choose A so as to normalize the density, and put 
nm+4yv\! 2(7 — 3) v 
y= |——] a, =, 
y 7 3(7 + 4v)2 


After a little algebra we deduce Maly) = 1+ 12k +.0(K*), 


Bly) = 3+ 24k + O(k?). 
Hence we derive the approximate formulae 
m(2v+ 1) 4(7—3)v 
aa -(Sear) (r+ wor 
16(7 —3)v 
“(7+ 4v) * 





b= 





te 
16 


us 


be 


Miscellanea 209 


Table 1 shows the degree of agreement with Hojo’s computed values. The first term of the approximation 
to C,, corresponding to the first approximation to D(a) given, gives for c, the value 1-149. If we expand 


the expression for c, we obtain 
1 4-7 1 
ca =/3 (1-5) +05): 


which agrees with Pearson’s formula quoted above. We note that Hojo’s regression result is also quite 
close. 


Table 1. Median in odd samples 














n Cn Bs 
3 Correct 1-1602 3-0347 
Approx. 11615 30444 
7 Correct 1-2137 3-0272 
Approx. 1-2141 3-0296 

ll Correct 1-2286 — 

Approx. 1-2286 —_ 




















3. THE MEDIAN WHEN n= 2v+2 


In this case the median is defined conventionally as the mean of the (v+1)th value x, and the (v+ 2)th 
value x3. The joint probability density for the variables 2,, x, is 


(2v+42)! 
viv! 
After a little manipulation we find that the probability density of the median is proportional to 





F"(x,) {1 — F(x,)}" dF (x,) dF (2). 


e~*" dx °Fn—t) (1 — F(x+t)}e-" dt 
0 


This proves more-troublesome to handle and we omit details. The results obtained are 


re (4-7) i 3)v 
-_ (1 + 2v) 4(7 + 2v) eo. 








In deriving these results the substitution y = vt was employed so that we do not obtain the correct value 
of c, by putting v = 0. 
Table 2. Median in even samples 














n Cy Bb, 
4 Correct 1-0922 3-0191 
Approx. 1-0647 3-0216 
8 Correct 1-1600 3-0200 
Approx. 1-1509 30204 

12 Correct 1-1898 _ 

Approx. 1-1844 _ 




















Biometrika 39 14 











210 | Miscellanea 


Table 2 compares the approximate values with those found by Hojo. Expanding c,, in powers of 1/n 


— _ [f,_8-™\, _ /m_o894l, of 
n= 15 | ie” | ee 7a) 


Hojo’s empirical result agrees quite well with this value. It will be seen that convergence of c, is much 
slower in this case. By using another term inside the bracket it is improved, so that c,, = 1-1906. The 
necessity for this term corresponds to the relatively large coefficient of 1/n? in Hojo’s relationship. 





4. THE GENERAL QUANTILE 


Consider a sample of np +nq+ 1 values with np integral and p+q = 1. Then the probability density of 
the (np + 1)th member of this sample will be 


(n+1) (”) F">(x) {1 — F(x)}"¢dF(a). 


Let the probability density of the parent population be expressed in a Taylor series about 7, where 
Ful=2 0 GF = 6) +0,6+0,0?+.... 
Taking logarithms of the probability density of the quantile, we obtain 
c+ nplog (1 + cob r +nq log (: 2b, i) +log (: 4184 4 é 
P q Co 
Expanding and collecting terms, this is found to reduce to 
e,¢ (= ec 


1> 
2pq 2c? 


-*) 4 +} O(G). 
7) Co 


As n increases the sample quantile converges to the population value and ¢ will be small. Hence ignoring 
the terms in ¢°, etc., we find that the quantile is approximately normally distributed, with 


2 
Mean: w= — , 
(nco + pq) 
Variance: v= ee. cae 
(no + pq) 


Here we have replaced c, and c, by their values for a normal curve. We compare these values with the 
asymptotic formulae 




















Mean =, 
Variance = —"? 2° 
(n+1)c, 
Table 3 
n 7 11 
Pp $ t 
Computed { Mean 0-757 0-729 
Values 8.D. 0-507 0-407 
Approx. {Mean 0-705 (0-754) 0-700 (0-731) 
Values |s.p. 0-518 0-412 
Aspt. Mean 0-966 0-842 
Values |\s.p. 0-562 0-432 

















———— 


a 


Miscellanea 211 


Table 3 shows the agreement with two cases taken from Hojo’s paper. It will be seen that the approxi- 
mate formulae provide results which are a big improvement on the asymptotic values. While further 
terms could be used, they become quite involved. The next approximation to the mean is found to be 


ut = ag WA apey 2co( p — 9)}- 
The values computed from this ferm are given in brackets in the table above. It is worth noting that in 
the first case Hojo computed By= 0-026, fy = 3-088. 


Thus the distribution of the sixth member in a sample of seven is surprisingly close to normality, 
although, as appears from the above table, the limiting values of mean and variance are very poor 
approximations to the true values. 


The author wishes to thank Dr N. L. Johnson for some suggestions as to the presentation of this work. 
Acknowledgement is made to the Chief Scientist, Ministry of Supply, for permission to publish 
this note. 
REFERENCES 


CapwWELL, J. H. (1951). Biometrika, 38, 475. 

Hose, T. (1931). Biometrika, 23, 315. 

Pearson, E. 8. & Apyantuaya, N. K. (1928). Biometrika, 20A, 356. 
Pearson, K. (193la). Biometrika, 23, 361. 

Pearson, K. (19316). Biometrika, 23, 364. 


On a correction term in the method of paired comparisons 
By J. A. van DER HEIDEN, Koninklijke/Shell-Laboratorium, Amsterdam 


The usual caleulation according to the method of paired comparisons requires a correction term if 
observers decline to express a preference between a pair of objects. I shall show that this correction term 
is equal to one-quarter of the number of times of no-preference. In the method of paired comparisons 
the 4n(n— 1) pairs out of a population of n objects are presented to m observers, who are supposed to 
express a preference for one member of each pair. If y,, is the number of observers having a preference 
for item a over item }b, 


(1) 


YatYoa=m (a+ , 
and 


Yar = 9 (a = 6). 
If x observers have a preference for a over b and 7 observers decline to express preferences for a or b, 
ap is defined b . 
ies J Ya = 7+ Hi. (2) 


Kendall & Babington Smith (1939) define a coefficient of agreement 


“OO” ji 


n 
where z= = s} (4) 
=1b=1 





which is a measure of agreement in appreciation in case the observers have alw® vs expressed a preference, 
i.e. when 7 = 0. 


Tf i is not zero, the terms (";) + = in (4) have to be replaced by 


SCC 79) . 








212 3 Miscellanea 


ee , 
as the probability that y,, is x+j (0<j <i) and y,, = m—2—j is (5) 2-*, It will be shown that the 
correction is equal to }i. 
This correction is given by 


mre Z (CDC TIHCT Ho 


We first calculate 


t fi e+j r+h + # x i fi SU AGV 
>» ( ‘Je )-( )= -—-——— +2-% & (‘) p+ 2-4) Y (;) (j-1). (7) 
j=0 \J 2 2 4 8 2 j=0 \J , j=0 \J _ 
i ; $ #3 
Now bs (;)3 =i (; ) = i2¢-1 
j=0 \J j=0 \J-1 
i a “se ose i i-1 ss 
and Xt .yG-) =(i-1) TI. =i(i—1) 2*-2, 
j=0 \J j=0 \J-1 
ew p ? ; 
Substitution in (7) then gives z (;) a (* ‘~ =- - *) my : (8) 
j=0 \J - : 8 
and since x does not occur in this result we likewise find 
4 fi m—x—j m—x—hi i 
a _ =. 9 
F (3)? > Wea it 
Hence c(x,m,%) = ji. (10) 


This result is in agreement with the numerical results of Kendall (1948), calculated for two examples 
with 7 = 1 andi = 2. 
REFERENCES 


KENDALL, M. G. (1948). Rank Correlation Methods. London: Charles Griffin and Co. 
KENDALL, M. G. & Basrinoton Smiru, B. (1939). Biometrika, 31, 324. 





V—_—_——————_— 














[ 213 ] 


REVIEWS 


Chance and Choice by Cardpack and Chessboard. By L. Hoapen. Max Parrish & 
Co. Ltd. Pp. i+417. 50s. 


It is claimed that ‘Choice and Chance’ aims to make the logical basis of statistical theory, that is the 
theory of probability, accessible to any college student who has the intellectual equipment to make use 
of statistical methods or to benefit from a practical course, and the author claims that this is done 
against the background of the new theory of logical inference developed in America by Neyman’s 
School (!). What Professor Hogben has done is to write a textbook of statistical methods in which the 
proofs of probability theorems are magnificently illustrated by coloured pictures. 

Among the subjects covered are permutations and combinations, expectations and moments, dis- 
tribution of a proportion, confidence intervals, and correlation. The approach to the subject is not par- 
ticularly novel, and once the statistician has grasped Professor Hogben’s nomenclature and symbolism 
he will find that there is little here which he has not read before. Statisticians appear, broadly speaking, 
to be divisible into two categories—those who find a diagram helpful and those who do not. The former 
class are appreciative of R. A. Fisher’s geometrical approach to sampling distributions and are un- 
doubtedly in the majority. This being so there will be many who will like to read this book, and possibly 
be stimulated by it. 

While Prof. Hogben may stimulate the professional statistician by his book, it is to be doubted 
whether he will succeed in teaching the biologist and non-mathematical workers generally. Euclid is 
reported to have remarked that there is no royal road to learning and any non-mathematician who has 
patience to puzzle out Prof. Hogben’s diagrammatic approach would probably find his time better em- 
ployed with an old and tried classic such as Yule and Kendall’s Introduction to the Theory of Statistics. The 
style of writing is irritating and at times unnecessarily florid. For example (p. 268), it is to be questioned 
whether the student’s knowledge of moment generating functions is increased by such explanations as 
‘Having witnessed the process of parturition specified by (xv) which delivers the function (f,) of x out of 
the womb of G,(f,) etc.’ Such remarks as (p. 256) that ‘it will not be necessary to traverse the tortuous 
steps which led Pearson himself to discover the general pattern of his system’ will only have sense if 
the author is substituting something less tortuous himself, and so on. However, Prof. Hogben is clearly 
an enthusiast, and as such much may be forgiven him. 

The writer found a curious archaic echo in such remarks as ‘Since it is the fashion to decry that part 
of statistical theory which rightly pertains to large samples etc.’ Years ago we had large samples with 
K. Pearson and small samples with R. A. Fisher. The writer thought that nowadays statisticians just 
had samples, all of which (given the same initial assumptions) could be treated by the same exact 
techniques. The interpretation of our techniques we shall always wrangle over, but since the final 
interpretation of a probability is a subjective business, this wrangling is possibly inevitable. 

The book is excellently printed and produced. Its price, however, is likely to put it beyond the reach 
of many of those at whom it is aimed. 


F. N. DAVID 


Hypothesis-testing in Time Series Analysis. By Prtmr Wuittie. Uppsala: 
Almquist and Wiksells. 1951. Pp. 120. 18s. 


In the present state of development of statistical theory it is taken as axiomatic by some statisticians 
that no test of significance of a statistical hypothesis should be constructed unless the alternatives to 
the hypothesis tested are clearly set out. There are exceptions to this statement, of course, the y? test 
being the classic example. Nevertheless, in general, if a sensitive test is desired, the test is usually con- 
structed so that a particular class of alternative hypotheses will not be overlooked. Simple illustrations 
of this are found in tests for randomness. We cannot define with any exactitude in a single comprehensive 
statement all the facets of randomness, but we can define what we mean by a particular non-random 
departure. Hence in testing for randomness we select from the numerous possible tests that particular 
one which will be most sensitive to that type of departure from randomness we are most anxious not to 
overlook. The battery of tests for randomness is useful statistically simply because the various alter- 
native hypotheses have been stated. 











214 Reviews 


If we accept the point of view that a statement of the hypotheses alternate to that tested is a pre- 
requisite to the construction of a test then, in spite of the considerable amount of research which has 
been done on time series analysis, the application of the theory can only be described as being in a 
thoroughly unsatisfactory state. The type of analysis carried out appears to depend to a large extent on 
the whim of the person applying the statistical method, and while any test may be better than no test, 
the underlying suggestion is (almost) that any test will do. It is this nebulous situation which Mr Whittle 
has set himself to clarify. He endeavours, for example, to construct tests of the hypothesis that the 
data follow an autoregressive scheme with the alternative that if they do not, then they follow a moving 
average and vice versa. The solutions are only approximate and it is a little difficult to see what would be 
the effects if the several assumptions which are made are not satistied. However, it ismore than sufficient 
to say that Mr Whittle is a pioneer and it has always been the fate of pioneers both to stimulate those 
who follow and to be criticized by those who are wise after the event. All who are interested in time 
series will benefit by reading this book if only from the stimulation and excitement which comes from 
trying to go one better than the author. This is an important contribution to the research work on time 
series and may well prove to be the foundation stone of a satisfactory theory. 

Having said this the reviewer may perhaps be permitted to grumble a little. It is pertinent to remark 
on the air of detachment from and seeming unawareness of a certain amount of statistical mathe- 
matics (e.g. §44 on the distribution of quadratic functions) which is well known enough to take for 
granted. It is surmised that this book is a doctoral thesis in which ‘ padding’ to a greater or less degree 
is to be expected. It is unfortunate, however, for the result of such a procedure makes it possible for one 
to overlook the author’s view and important ideas in the restatement of those of other people. 

Because the book is so important the real criticism which the reviewer would make is that it is to be 
deplored that the new techniques stand or fall by means of specious arguments based on Bayes’s theorem. 
The author starts off with Bayes’s theorem and as is inevitable, finds himself in the position of having to 
make assumptions regarding the probability of hypotheses. Making arbitrary assumptions and proceeding 
to a limit he arrives at what is in fact the likelihood ratio, when he finds, not surprisingly, that the 
application of his disguised likelihood ratio to certain classical problems produces classical results. It 
would have seemed simpler to skip Bayes and to start off with the likelihood ratio as a principle. 
However, it is always easier to be critical than to create, and research workers for some time to come 
will be grateful to Mr Whittle for introducing them to an interesting set of new ideas. 

F. N. DAVID 


Foundations of the Theory of Probability. By A.N. Kotmogorov. Pp.v+71. $2.25. 


Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. By A. KuINTCHINE. 
Pp. 76. Chelsea Publishing Company, N.Y. $2.00. 


Probability theory in the past fifty years has owed its development principally to the French and the 
Russian schools. Many probabilists are familiar with the work of the French, and others have been 
made aware of the extent of the Russian contribution by reading the work of Cramér, of Uspensky 
and of Feller. Inthe two books published by the Chelsea Company we have reprinted works on probability 
which must have been authoritative at the time at which they were written (1933). 

Prof. Kolmogorov gives a summary of the axiomatic approach to the theory of probability. Nothing 
is contained here which has not been repeated and extended in the publications edited by M. Borel, but 
the development is clear, vigorous and satisfying and will be enjoyed by all who like the mathematics 
of probability. 

Prof. Khintchine discusses stochastic processes, and his work gives a picture of the elementary 
development which will be found useful even at the present time, although there is little here which has 
not been covered by Feller in recent publications. 

For the student of probability who finds fresh understanding of theory with a fresh point of view, the 
return to beginnings as represented by these reprints will prove a fruitful pilgrimage. Those just starting 
seriously to study probability will find these books most helpful. 

It is unfortunate that it should still be considered necessary to publish Prof. Khintchine’s book ‘by 
Authority of the Attorney General of the United States, ete.’ It is difficult to see that this book could 
ever have been regarded as other than pure science. Further, an author should receive royalties for his 
work, 

F. N. DAVID 








Reviews 215 


Statistical Methodology Reviews, 1941-1950. Edited by Oscar Krisen Buros. 
x+457 pp. New York: John Wiley and Sons. $7.00; London: Chapman and 


Hall. 56s. 


This handsomely produced collection of reviews includes extracts from fifteen reviews of its pre- 
decessor, which reveal the general appreciation of the earlier work of Prof. Buros. This later volume, 
confined more strictly to methodological statistical books, can hardly be less appreciated by the 
statistical community. Its main value will be as a preliminary guide to anyone who has to choose 
between text-books he does not know; an inspection of these reviews will sometimes shorten the list 
of books which must actually be examined. By professional statisticians, and in particular the 
post-war generation, the book will also be used to satisfy justifiable curiosity concerning reviewers. 
Despite the greater selectivity of his latest work, Prof. Buros has still provided us with ample oppor- 
tunity to compare individual judgments, and, occasionally, to indulge in a little Schadenfreude. 

A. STUART 


Editorial Note. 


It has been brought to the Editors’ attention that the result obtained in the paper ‘Note on the 
inversion theorem’ by J. Gil-Pelaez (Biometrika, December 1951, pp. 481-2) is a special case of results 
obtained in the paper ‘Inversion formulae for the distribution of ratios’ by John Gurland (Annals of 
Mathematical Statistics, June 1948, pp. 228-37). 








‘(All Rights reserved) | 
BIOMETRIKA. Vol. 39, Parts 1 and 2 
CONTENTS 


Moment coefficients of the k-statistics in samples from a finite population. By JonN WisHaRT 
Moment-statistics in samples from a finite population. By M. G. Kenpatn 
Some exact tests in multivariate analysis. By E. J. Wu11aMs é . 


The construction of balanced — for ere eet: * of treatments. By] H. D. 
PaTIERSON . ° 


Naeid Haneod declan of teat exste. By G. E. P. Box ‘ 4 ; 

Tests of significance in canonical analysis. By F. H.C. Marriorr . : 
The interpretation of interactions in factorial experiments. By E. J. WixL1AMs 
On sampling from a population of rankers. By A. 8. C. EHRENBERG ° ‘ ° 
Least-squares estimation of location and scale parameters using order statistics. By E. H. lunes 
Regression, structure and functional relationship. Part II. By M. G. Kenpati 
On the concurrence of a set of regression lines. By K. D. TooHzR . $ > ° ‘ 
A sampling test of the x* theory for probability chains. By M.S. Barturtr . 

On mathematical analysis of style. By WuHELM Fooxs . 


Comparison of two approximations to the distribution of the au in small sample from normsi 
populations. By E. 8. Pearson ° 


The covering circle of a sample from a circular normal distribution. By E H. E. Danrets . 
The frequency justification of certain sequential tests. By G. A. BARNARD 
Experimental designs for serially correlated observations. By R. M. Wii1ams . : 


The time intervals between industrial accidents. ~ B. A. arson. E. 8. Pearson and A. H. A. 
Wynn 


The estimation of death-rates from eae. -mark- gM aii: By P. A. P. Moran 


MIscELLANEA 
A note on the design problem. By K. D. Tocuzr 
Tables of ee oo of the ‘Studentized’ extreme davies heoies the emple1 mean. - By’ K. R. 
Nam 


Extended and oneinad tables of the upper peroentage point of the ‘Studentized’ ~~ 
by Joyvoz M. May 


On the distribution of ‘Studentized’ range. By K. Cc. 's. eta a 

Note on a certain family of discrete distributions. By J. 8S. Marrrz . ° 

Some properties of runs in smoothed random series. By Arison M. Grant ° 

An approximation to the symmetrical incomplete beta function. By J. H. CapwELu 

The distribution of quantiles of small samples. By J. H. CapwEi . ° ° 

On s correction term in the method of paired comparisons. By J. A. seisk Sink Sibi 
REviInws 

L. Hoessn’s ‘Chance and Choice by Cardpack and Chessboard’ 

P. Wurrriz’s ‘Hypothesis-testing in Time Series Analysis . 

A. N. Kotmogorov’s ‘Foundations of the Theory of Probability’ 

A. Kurvrourn’s ‘Asymptotische Gesetze der Wahrscheinlichkeiterechnung’ 

O. K. Buros’s (ed.) ‘Statistical Methodology Reviews’ . : 


Eprrormt Norse 


First printed in Great Britain at the University Press, Cambridge 
Reprinted by offset-litho by Percy Lund, Humphries & Co., Lid, Bradford 





&® 





